BRAGA NEMACKER 


Th EST EO.RY: 


He 6) 
MATRICES 


THE THEORY OF 
MATRICES 


F, R. GANTMACHER 


VOLUME ONE 


1959 


e PREFACE 


THE MATRIX CALCULUS is widely applied nowadays in various branches of 
mathematics, mechanics, theoretical physics, theoretical electrical engineer- 
ing, etc. However, neither in the Soviet nor the foreign literature is there a 
book that gives a sufficiently complete account of the problems of matrix 
theory and of its diverse applications. The present book is an attempt to fill 
this gap in the mathematical literature. 

The book is based on lecture courses on the theory cf matrices and its 
applications that the author has given several times in the course of the last 
seventeen years at the Universities of Moscow and Tiflis and at the Moscow 
Institute of Physical Technology. 

The book is meant not only for mathematicians (undergraduates and 
‘research students) but also for specialists in allied fields (physics, engi- 
neering) who are interested in mathematics and its applications. Therefore 
the author has endeavoured to make his account of the material as accessible 
as possible, assuming only that the reader is acquainted with the theory of 
determinants and with the usual course of higher mathematics within the 
‘programme of higher technical education. Only a few isolated sections in 
the last chapters of the book require additional mathematical knowledge on 
the part of. the reader. Moreover, the author has tried to keep the indi- 
vidual chapters as far as possible independent of each other. For example, 
Chapter V, Functions of Matrices, does not depend on the material con- 
tained in Chapters II and III. At those places of Chapter V where funda- 
mental concepts introduced in Chapter IV are being used for the first time, 
the corresponding references are given. Thus, a reader who is acquainted 
with the rudiments of the theory of matrices can immediately begin with 
reading the chapters that interest him. 

The book consists of two parts, containing fifteen chapters. 

In Chapters I and ITI, information about matrices and linear operators 
is developed ab initio and the connection between operators and matrices 
is introduced. . 

Chapter II expounds the theoretical basis of Gauss’s elimination method 
and certain associated effective methods of solving a system of n linear 
equations, for large n. In this chapter the reader also becomes acquainted 
with the technique of operating with matrices that are divided ‘into rectan- 
gular ‘blocks.’ 


iii 


1V PREFACE 


In Chapter IV we introduce the extremely important ‘characteristic’ 
and ‘minimal’ polynomials of a square matrix, and the ‘adjoint’ and ‘reduced 
adjoint’ matrices. 

In Chapter V, which is devoted to functions of matrices, we give the 
general definition of f(A) as well as concrete methods of computing it— 
where f(A) is a function of a scalar argument J and A is a square matrix. 
The concept of a function of a matrix is used in $§ 5 and 6 of this chapter 
for a complete investigation of the solutions of a system of linear differen- 
tial cquations of the first order with constant coefficients. Both the concept 
of a function of a matrix and this latter investigation of differential equa- 
tions are based entirely on the concept of the minimal polynomial of a matrix 
and—in contrast to the usual exposition—do not use the so-called theory of 
elementary divisors, which is treated in Chapters VI and VIT. 

These five chapters constitute a first course on matrices and their appli- 
cations. Very important problems in the theory of matrices arise in con- 
nection with the reduction of matrices to a normal form. This reduction 
is carried out on the basis of Weierstrass’ theory of elementary divisors. 
"In view of the importance of this theory we give two expositions in this 
book: an analytic one in Chapter VI and a geometric one in Chapter VIT. 
We draw the reader’s attention to 8§ 7 and 8 of Chapter VI, where we study 
effective methods of finding a matrix that transforms a given matrix to 
normal form. -In §8 of Chapter VII we investigate in detail the method 
of A..N. Krylov for the practical computation of the coefficients of the 
characteristic polynomial. 

In Chapter VIII certain types of matrix equations are solved. We also 
consider here the problem of determining all the matrices that are permutable 
with a given matrix and we study in detail the many-valued functions of 
matrices %\/ A and InA. 

Chapters IX and X cleal with the theory of linear operators in a unitary 
space and the theory of quadratic and hermitian forms. ‘These chapters do 
not depend on Weierstrass’ theory of elementary divisors and use, of the 
preceding material, only the basic information on matrices and linear opera- 
tors contained in the first three chapters of the book. In 89 of Chapter X 
we apply the theory of forms to the study of the principal oscillations of a 
system with n degrees of freedom. In § 11 of this chapter we give an account 
of Frobenius’ deep results on the theory of Hankel forms. These results are 
used later, in Chapter XV, to study special cases of the Routh-Hurwitz 
problem. 

The last five chapters form the second part of the book [the second 
volume, in the present English translation]. In Chapter XI we determine 
normal forms for complex symmetric, skew-symmetric, and orthogonal mat-. 


PREFACE Vv 


rices and establish interesting connections of these matrices with real matrices 
of the same classes and with unitary matrices. 

In Chapter XII we expound the general theory of pencils of matrices of 
the form A + AB, where A and B are arbitrary rectangular matrices of the 
same dimensions. Just as the study of regular pencils of matrices A + AB 
is based on Weierstrass’ theory of elementary divisors, so the study of singu- 
lar pencils is built upon Kronecker’s theory of minimal indices, which is, as 
it were, a further development of Weierstrass’s theory. By means of Kron- 
ecker’s theory—the author believes that he has succeeded in simplifying the 
exposition of this theory—we establish in Chapter XII canonical forms of 
the pencil of matrices A + AB in the most general case. The results obtained 
there are applied to the study of systems of linear differential equations 
with constant coefficients. 

In Chapter XIII we explain the remarkable spectral properties of mat- 
rices with non-negative elements and consider two important applications 
of matrices of this class: 1) homogeneous Markov chains in the theory of 
probability and 2) oscillatory properties of elastic vibrations in mechanics. 
The matrix method of studying homogeneous Markov chains was developed 
in the book [25] by V. I. Romanovskii and is based on the fact that the matrix 
of transition probabilities in a homogeneous Markov chain with a finite 
number of states is a matrix with non-negative elements of a special type 
(a ‘stochastic’ matrix). 

The oscillatory properties of elastic vibrations are connected with another 
important class of non-negative matrices—the ‘oscillation matrices.’ These, 
matrices and their applications were studied by M. G. Krein jointly with 
the author of this book. In Chapter XIII, only certain basic results in this 
domain are presented. The reader can find a detailed account of the whole 
material in the monograph [7]. 

In Chapter XIV we compile the applications of the theory of matrices 
to systems of differential equations with variable coefficients. The central 
place (§§ 5-9) in this chapter belongs to the theory of the multiplicative 
integral (Produktintegral) and its connection with Volterra’s infinitesimal 
calculus. These problems are almost entirely unknown in Soviet mathe- 
matical literature. In the first sections and in § 11, we study reducible 

‘Systems (in the sense of Lyapunov) in connection with the problem of stabil- 

ity of motion; we also give certain results of N. P. Erugin. Sections 9-11 
refer to the analytic theory of systems of differential equations. Here we 
clarity an inaccuracy in Birkhoff’s fundamental theorem, which is usually 
applied to the investigation of the solution of a:system of differential equa- 
tions in the neighborhood of a singular point, and we establish a canonical 
form of the solution in the case of a regular singular point. | 


vi PREFACE 


In § 12 of Chapter XIV we give a brief survey of some results of the 
fundamental investigations of I. A. Leppo-Danilevskii on analytic functions 
of several matrices and their applications to differential systems. 

The last chapter, Chapter XV, deals with the applications of the theory 
of quadratic forms (in particular, of Hankel forms) to the Routh-Hurwitz 
problem of determining the number of roots of a polynomial in the right 
half-plane (Rez > 0). The first sections of the chapter contain the classical 
treatment of the problem. In § 5 we give the theorem of A. M. Lyapunov in 
which a stability criterion is set up which is equivalent to the Routh-Hurwitz 
criterion. Together. with the stability criterion of Routh-Hurwitz we give, 
in § 11 of this chapter, the comparatively little known criterion of Liénard 
and Chipart m which the number of determinant inequalities is only about 
half of that in the Routh-Hurwitz criterion. 

At the end of Chapter XV we exhibit the close connection between stabil- 
ity problems and two remarkable theorems of A. A. Markov and P. L. 
Chebyshev, which were obtained by these celebrated authors on the basis of the 
expansion of certain continued fractions of special types in.series of decreas- 
ing powers of the argument. Here we give a matrix proof of these theorems. 

This, then, is a brief summary of the contents of this book. 


F. R. Gantmacher 


PUBLISHERS’ PREFACE 


Tue Pusiisuers WISH To thank Professor Gantmacher for his kindness in 
communicating to the translator new versions of several paragraphs of the 
original Russian language book. 

The Publishers also take pleasure in thanking the VEB Deutscher Verlag 
der Wissenschaften, whose many published translations of Russian scientific 
books into the German language include a counterpart of the present work, 
for their kind spirit of eooperation in agreeing to the use of their formulas 
in the preparation of the present work. 

No material changes have been made in the text in translating the present 
work from the Russian except for the replacement of several paragraphs by 
the new versions supplied by Professor Gantmacher. Some changes in the 
references and in the Bibliography have been made for the benefit of the 
English-language reader. 


CONTENTS 


P REFA CE out ore Teenne tO00O0 C00 SE8 LOOT LS PESODETETLS HETESELERLEDT HOT 9 08 OD 90 OP BEESON EEEEDD O0 ODES PEDLLY HOSEID PDELID S EDTED DOODER ODT O88 OFODET OOS POSS PUOODINOSTUTD OSOROEROL OEDOOUE CS 
PUBLISHERS” PREFACE -cscsesseesssusensescrenssinsensrumteransteinmueuentiuanneustecemesnanenntinenttents ~ 
I. MATRICES AND OPERATIONS ON MATRICES... -.ccssossssesssnssssssnee co 

§ 1. Matrices. Basic notation —~--mmenmninnnnnnnntennnnimsn ate 

§ 2. Addition and multiplication of rectangular matrices............ a 

§ 8. Square matrices -nwccennsncusmnmmuenmmecrtnesnesnensrnmeinenrsnnussssnsstisiteneteet 

§ 4. Compound matrices. Minors of the inverse Matrix. = 

II. THe ALcoriTHM OF GAUSS AND SOME OF ITS APPLICATIONS 

§ 1. Gauss’s elimination method. -vmmnseccrsemerieeunsemmttmasentenee 

§ 2. Mechanical interpretation of Gauss’s algorithm nemmrnmn és 

§ 3. Sylvester’s determinant identity —---mrmsmsneneeneneeenamnmenrnneat ee 

§ 4. The decomposition of a square matrix into triangular fac- 

LOTS  cccsssscoossnsrasceccersssececcesesseeonseccesnsersersnrsssteeessnstescssensnescer susseeoes censummnsosessanansnenavserentnosnecseneanteeets 

§ 5. The partition of a matrix into blocks. The eanniaue of 


operating with partitioned matrices. The generalized algo- 
rithm of Gauss ....... 


GED BODE REPOS COW HOSE HTT CHESS OOD STO OSES SY FOOTE COST EY 


oboe Beepee 


II. LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


§ 1. 
§ 2. 


§ 3. 


§ 4.° TransSforMation Of COOTAINALES curccecssrecsssorsconsecsecsesecrsecsseccocssnecsveceseesevecnaes is 


§ 5. 


6. 


VeCtOL SPACCS eresssoressnesccorenegteecernsnassssseeanttetenevnnsessecenrasessssenunteeseseeessassonnsnessinensne - 
A linear operator mapping an n- ainensiona! space into an 
M-GIMENSIONAL SPACE neersevverssessnrnssscccesenscncerss Siac ces apeestatccstateaittiaananerwctts 
Addition and multiplication of Limear OPCTatOTS...oressessessesessseres 


Equivalent matrices. The rank of an operator. Sylvester’s 
WHOCOUA IVEY esheets tice Sa ts cles tanned denotes 
Linear operators mapping an n-dimensional space into 
itself 


oseeee 0000 cncees oes were ts 00s 8500 o LSet EES BOST tS ONOESRT CSSD COLTEESS FOOTE ODDEST BEDE ES DER EETESS LETe DS OCeCTETErses 


Vii 


lil 
vi 


33 


41 


61 


66 


Vii 


iv, 


VI. 


CONTENTS 


§ 7. Characteristic values and characteristic vectors of a linear 


ODETALON aikescask ecco een oe ee ttne aera aan oie: . 69 
§ 8. Linear operators of simple Striecture...ccemesueneon sneueeeecsesenne ee 
THE CHARACTERISTIC POLYNOMIAL AND THE MINIMAL Pot ¥- 
INOOIMETAE OF A> MUA DRM iiss orcs a ~~ 6 
§ 1. Addition and multiplication of matrix polynomials... 76 
§ 2. Right and left division of matrix polynomials... nn 77 
§ 8. The generalized Bézout theoremenaaecscsesacssssnsnceenaruesusnesee _. 80 
§ 4. The characteristic polynomial of a matrix. The adjoint 

TPAC ease aaa a a eh ee area . 82 
§ 5. The method of Faddeev for the simultaneous computation 

of the coefficients of the characteristic polynomial and of 

PME AC jOING ANAL eee gael a reauneineenetcat ee eiadeeedadans 87 
§ 6. The minimal] polvnomial] of & Matrix.cecssscssscceeesreeneeneesemusse 89 
FUNCTIONS OF MATRICES wun.ccccctencsccmmnccnsecnenmnnmameneen 95 
§ 1. Definition of a function Of & Matrix. mnencneneennemnenne 95 
§ 2. The Lagrange-Sylvester interpolation polynomial... 101 
§ 3. Other forms of the definition of f(.4). The components 

CP TIVO A ceca rl ek ed Sees 104 
§ 4. Representation of functions of matrices hy means of series 110 
§ 5. Application of a function of a matrix to the integration of 

a system of linear differential equations with constant 

COCEICICNUS - Fetal Fee etna aloe acer seeees 116 
§ 6. Stability of motion in the case of a Linear Systemn.nnesusccenee 125 
EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES. 
ANALYTIC THEORY OF ELEMENTARY DIVISORS .........0:-ccscsesnsessns 130 
§ 1. Elementary transformations of a polynomial MALI Xeennnenn 130 
§ 2. Canonical Lorm Of & A-AatriX.eeceecssenesenesenrnrecseesnneernseaseensemenmennsene 134 
§ 3. Invariant polynomials and elementary divisors of a poly- 

THOTT] MALT IAX eeesescesesescssssssscoveceeeceenenesennsnnssegcesnonessnnnsnsssceccescnssomammuntensccsensesseeenesssanets 139 
§ 4. Equivalence of Tinear Dinomials..snennnmnse Bee et Se leas 145 
§ 5. A criterion for similarity of MatTiCeSeunernesesmnemmnminen wen 147 
§ 6. The normal forms of & Matri Xmen baapecetdea tate ts . 149 
§ 7. The elementary divisors of the matrix f(A.) secsctoonrasorerrernsocorsernana 153 


VIII. 


ONTENTS ix 


§ & A general method of constructing the transforming matrix 159 
§ 9. Another method of constracting a transforming matrix... 164 
THE STRUCTURE OF A LINEAR OPERATOR IN AN n-DIMEN- 
SIONAL SPACB specs So hat eee itaacceem ceils wee 175 
§ 1. The minimal polynomial of a veetor and a space (wath 
TeSPect to a GIVEN Linear OPCLator ) .cccsmmessssssnreeccsenesenenssesssseey pre ores BY E5 
§ 2. Decomposition into invariant subspaces with co-prime 
minimal polynomials ........-. Sic ds cecasatcis eisceusceeen ena ceactcigee ee Le 
§ 3. Congrurence. Factor space --rcscnssssenesnssesenseessassssensnssemuaresssanectesunsenanes 181 
§ 4. Decomposition of-a space into cyelic invariant subspaces... 184 
§ 5. The normal form Of & MatTiXeeunmmnnnnenwmnsnenninunsusnnsnmnsansne 190 
§ 6. Invariant polynomials. Elementary divisorsmemeussmessenensnne 193 
§ 7. The Jordan normal form of a Matri xXnceecmcnseensenesmeesenaerenssates 200 
§ 8. Krylov’s method of transforming the secular equation....... 202 
MATRIX EQUATIONS... ccscsssscsccssssssesccssmescsscsscensssuuusssessesseeseencanansscesecesseesmnnnseceessmansetseg x BAD 
S 1. The ecgiation AX = NB ccensmesesassnssnessenmmnnnnennmeaninmte 215 
§ 2. The special case A = B. Commuting Matrices.ovenrmnrnnene . 220 
§ 3. The equation AX — XB =C msuueneeneeenenamnenennannene wae 225 
§ 4. The scalar eqrration f(A) = Deneeenmenenenanameneaeensenueint 220 
§ 5. Matrix polynomial equations o.sessuemensmenmnemennmunnmenens 227 
§ 6. The extraction of m-th roots of a non-singular mat ‘yb, eee 23] 
§ 7. The extraction of m-th roots of a singular MAUl IMs cecasaiwices 234 
§ 8. The logarithm of a Matrix.nnennnnenmnmmnmsnamamnnnnrtnnsnetaan 239 
LINEAR OPBRATORS IN A UNITARY SPACE .....-.ccscrsseesenresmnsmeeneen . 242 
§ 1. General POTISICIETA LIONS cneiciiceveass clot ees tacniatecatnnadlereanete’ 242 
§ 2. Metrization OF A SPCC aearcsnevrcosnessnensesanseensooseat vost estnessuneccneesoansotsnteunsaaeneoanae 243 
§ 3. Gram’s criterion for linear dependence Of VeCtOTS..cvccenese . 246 
§ 4. Orthogonal PTOJECCION -sesessreesseneennersornnerertnennennsonteasentestnntesnevnseneeneants 248 
§ 5. The geometrical meaning of the Gramian and some in- 
CQUALILICS neesnoonnee santosh Snel a oaaansainta oC innocctabo oabecthchobcsioosetan we 200 
§ 6. Orthogonalization of a sequence OF VECTOR Bessette sctaceecta 296 
§ 7 Orthonormal bases scvesssessesenesesensmnrrsemmnernscneunmnenstersnenesettet 262 
§ 8 The adjoint OPerator,.nvuemenesrcssmnusenenrieeessnenenneentinmmnnensnannietieiet 265 


x 
§ 9. Normal operators in a Umitary SPaCe.—mmmomenewrrremenrenrnsnen 
§ 10. The spectra of normal, hermitian, and unitary operators... 
§ 11. Positive-semidefinite and positive-definite hermitian op- 
CLAtOLS orn = vistas Sci Csaba ces taasanrennstcons chest catia cn 
§ 12. Polar decomposition of a linear operator in a unitary space. 

Mey Wey ROUT AG cass es ca ee 
§ 13. Linear operators in a euclidean SPA CC rmeennnnenee Bete claats 
§ 14. Polar decomposition of an operator and the Cayley for- 
MAUlAS IN A CVUCLIGEAN SPACO.saeasscssmesccvevssesserescerresesoresssesinsonsees sates teemteess 
§ 15. Commuting mormal OPerators, eresscenecvecusssenescsnosuerseererenerereeeeneenesee. 
XK. QUADRATIC AND HERMITIAN FORMS .0.cceccesmeccccemserernemmrnenes. 
§ 1. Transformation of the variables in a quadratic form............. 
§ 2 Reduction of a quadratic form to a sum of squares. The 
FAW OE CRUE Sci aie igen ali kt nk i 
§ 3. The methods of Lagrange and Jacobi of reducing a quad- 
ratic form to a sum of squares... Aiea eae hae 

§ 4 Positive QUAGAtiC LOTS eee sscssucseusnussnensmeenemenatemusemenineneec, 

§ 5. Reduction of a quadratic form to principal aXeS.ewcmsenusmen 
§ 6 Pencils Of quadratic LOrMS.ceecscssunseuewnesessnseensenssensstnesaneneieeeeesece 
§ 7. Extremal properties of the characteristic values of a regu- 
Var PENCIL OL TOMI S ic cco al et! 
§ 8. Small oscillations of a system with n degrees of freedom... 
S'00.: “Elermia ii CORMIS etait crete al eGo a Sloe 8 a a 
S10. lanel, TODS: sais Siete t 8 seer areca as erence eines 
BIBLIOGRAPILY: casmanenacud ounaadceunauatendberuntena utes a Scat nnetateel 


CONTENTS 


CHAPTER I 


MATRICES AND OPERATIONS ON MATRICES 


§ 1. Matrices. Basic Notation 


1. Let F be a given number field.’ 


DEFINITION 1: A rectangular array of numbers of the field F 


| 21 %p Ay», 
any ane eevee Qaan 


is called a matriz. Whenm=n, the matrix ts called square and the number 
m, equal to n, is called its order. In the general case the matrix 1s called 
rectangular (of dimension m X n). The numbers that constitute the matrix 
are called its elements. 


Notation: In the double-subscript notation for the elements, the first 
subscript always denotes the row and the second subscript the column con- 
taining the given element. 

As an alternative to the notation (1) for a matrix we shall also use the 
abbreviation 

lax. || (¢=1,2,...,m; kK=1, 2,...,). (2) 


Often the matrix (1) will also be denoted by a single letter, for example A. 
If A is a square matrix of order n, then we shall write A= ||a,||". The 
determinant of a square matrix A= ||a, ||? will be denoted by |a,,|" or by 
| A |. 


1A number field is defined as an arbitrary collection of numbers within which the four 
operations of addition, subtraction, multiplication, and division by a non-zero number 
can always be carried out. 

Examples of number fields are: the set of all rational numbers, the set of al] real num- 
bers, and the set of all complex numbers. 

All the numbers that will occur in the sequel are assumed to belong to the number 
field given initially. 


D, T. Matrices AND Matrix OpvERATIONS 


We introduee a coneise notation for determinants formed from elements 
of the given matrix: 


A ay ro ee vy a, ky Qi,k, Dinky = 
ky ky...) ~ (3) 
1 2 of ee p Ce ee 
Dink, Wink, Qipks | 


The determinant (3) is called a minor of A of order p, provided 
lSu<ce<...cySmandlsk, chk <...< k,n. A rectangular 


matrix A= | Dit : CB 1, co I ST 2 YS Nas (7) (ny minors 


of order p 


a Pees | lsy<tgp<ee+-<ti<m 
4() : "| ( ae >" 5 psm,n}. (3’) 
ky ka... k, lsh, <hj<--+<kjsn 
The minors (37) in whieh a =4,, te => he, 2... p=, are called principal 
minors. 
In the notation, (3) the determinant of a square matrix A= |,ag{iP van 
be written as follows: 
l1 2...” 
|A|=A & : 
| ees 


The largest among the orders of the non-zero minors generated by a 
matrix is called the rank of the matrix. If ris the rank of a reetangular 
matrix A of dimension m X 2, then obviously 7 = min(m, 22). 

A rectangular matrix consisting of a single column 


is called a column malror and will be denoted by (.4, r2a,....2n)- 
A rectangular matrix consisting of a single row 


|24, Za. «+ +s Zell 


is called a row matrix and will be denoted by [21, 22,..., Zn]- 
A square matrix in which all the elements outside the main diagonal 
are zero 


$2. AppITION AND MULTIPLICATION OF MaTRICES 3 


\d, 0 ...0.1 
0 0...d, 


is ealled a diagonal matrix and is denoted by? |@,5;, |” or by 


{ dy. da, ..-, dy). 


4 


Suppose that m quantities yi, Yo,..., Ym have linear and homogeneous 
expressions in terms of 7 other quantities 7), z,...,2n: 


Yy = Ay, TF Aygo + +> + 4,,%, 
Yo = AgyX, + Agog + °° + Ay,Z, 


stadia batten te nataneedi (4) 
Yn Unt Vy + AnoXe foe ft Xn . 

or more concisely, 
n= 3 tut ( =1, 2, +..,m). (4’) 


The transformation of the quantities +1, r2,...,£, into the quantities 
Yi, Y2,--+, Ym by means of the formulas (4) is called a lnicay transformation. 

The coefficients of this transformation form a rectangular matrix (1) 
of dimension m X n. 

The linear transformation (4) determines the matrix (1) uniquely, and 
vice versa. 

In the next section we shall cdlefine the basi¢ oj rations on rectangular 
matrices using the properties of the linear transformations (4) as our start- 
ing point. 


§ 2. Addition and Multipligation of Rectangular Matrices 


( 
We shall define the basic operations on matrices: addition of matrices, 
multiplication of a matrix by a number, and multiplication of matrices. 


1. Suppose that the quantities yi, Yo,..., Yn are expressed in terms of 


'the quantities 2), %2,..., ZX, by means of the linear transformation 
n 
Y= > UX, (¢ =1, 2, ..., m). (5) 
k= 


tad 


2 Here 81. is the Kronecker symbol: 6;, = 15 ae 
é : 


4 I. Matrices AND Marrix OPERATIONS 


and the quantities 21, Z2,..., 2m in terms of the same quantities 2), ro,..., Lp 
by means of the transformation 


n ry 
=D but = (§=1, 2, ..., m). (6) 
Then 
yt m= 2) (Ay + by) m (@=1, 2,...,m). (7) 
In accordance with this,we formulate the following definition. 
DEFINITION 2: The sum of two Sends matrices A= || @ix ! and 
= | bix = || cx ||, of the same 


cae. “iN8e elements are the sums of the NEA 1 elements of the 
given matrices : 
C=A+tB, 
where 
C,.—a,+6, (t=1, 2,....m; k=1, 2,..., m). 


The operation of forming the sum of given matrices is called addition. 
Example. 


lee el+laaalalita gta spall. 


According to Definition 2, only rectangular matrices of equal dimension 
can be added. 

By virtue of the same definition, the coefficient matrix of the transforma- 
tion (7) is the sum of the coefficient matrices of the transformations (5) 
and (6). 

From the definition of matrix addition it follows immediately that this 
operation has the properties of re, a and associativity : 


1 A+B=B+A; 
29 (A+B) +C=A+(B+C). ‘ 


Here A, B, and C are arbitrary rectangular matrices all of equal dimension. 


The operation of aauition of matrices extends in a natural way to the 
ease of an arbitrary finite number of summands.. 


2. Let us multiply the quantities Yay Yay ++ Ym in the transformation 
(5) by some number a of r. Then 


ay; = (aay) z, \ (t=1,2,..., m). 


In accordance with this, we formulate the following*definition. 


§ 2. ADDITION AND MULTIPLICATION OF MATRICES D 


DEFINITION 3. The product of a matrix A= | Qik | (¢=1,2,...,m; 
k=1,2,...,n) byanumber a of Fis the matriz C = } Cx |} (= 1.2,..., m; 
k=1,2,....n) whose elements are obtained from the corresponding cle- 
ments of A by multiplication by a: 

C=—aA, 
where 
Cy == OO, (a= 1, 2,...,m;k=1, 2, .,.,). 


The operation of forming the product of a matrix by a number is called 
multiplication of the matriz by the number 


Example. ree ee 
a || Bt ac ll =|[ abt abe obs ||: 
10, Og Og}; = || HO, Og As | 


It is easy to see that 

1. a@(A + B)=@A+acB, 

2.(a+f)A =aA+ PA, 

3.(aB)A = a(Bs) 
Here A and B are rec : ular matrices of equal dimension and a and £ are 
numbers of F. . 

The difference A —B of two rectangular matrices of equal dimension 


is defined by 
A—B=A+t+(—1)B. 


If A is a square matrix of order and a a number of F, then’* 


| aA = a” | A | . 
3. Suppose that the quantities 2), Z2,...,2m are expressed in terms of 
the quantities y1, y2,..-, Yn by the transformation 
nr 
Z,= >) Wy (¢=1, 2,..., m) (8) 
k=l 
and that the quantities y;, y2,..., Yn are expressed in terms of the quantities 
21, %2,.--+,%q by the formulas 
q 
te dy dy; (k=1,2, ..., 2) (9) 
j= 
Then on substituting these expressions for the y; (kK =1,2,...,”) in (8) we 
can eXpress 21, 22,.--, 2m in terms of xj, Fe,..., %q by means of the composite 


transformation : 


3 Here the symbols ! .4 {| and | @A | denote the determinants of the matrices 4 and @A4 
(see p. ] ). 


6 I. Matrices anp Matrix OPERATIONS 


n q q n 
2: Oy iy D) Oyjty= (DS aindyj)%; = (b= 1, 2, .-., m). (10) 
k=1 j=l j=l k=l 


In accordance with’this we formulate the following definition. 


DEFINITION 4. The product of two rectangular matrices 


A= || 721 G22 +++ Fan B= Pan bog +» bag 
Qnt Ine Ann | baa bane * Dag 


as the matron 


in which the element c;; at the intersection of the 1-th row and the j-th column 
is the ‘product’* of the i-th row of the first matrix A wnio the j-th column 
of the second matriz B: 


4 = Mabey (t= 1, 2,.-.,m; 7=1, 2, ...,q)- (11) 


The operation of forming the product of given matrices is called matrix 
multiplication. 


Example. 
a, a, as Cy . 41 h aes 
b be b C, dg & fe || = 
» aes Cy dy es fs 


AyCy + AQCg + AsCg 2,0, + Aqdy + Ggdy yey + gly + Gseg yf, + Gof, + Asfy 
Bic, + becg + dbycg Bd, + Body + Dgdg Bye, 4+ beg +-"bgeg fy + Bafa + Daf 


By Definition 4 the coefficient matrix of the transformation (10) is the 
-product of the coefficient matrices of (8) and (9). 

Note that the operation of multiplication of two rectangular matrices can 
only be carricd out when the number of columns of the first factor 1s equal 
to the number of rows of the second. In particular, multiplication is always 
possible when both factors are square matrices of one and the same order. 


4The product of two sequences of numbers a, @,..., dn and bi, b2,..., On is defined 


nr 
as the sum of the products of the corresponding numbers: 3S a;b;. 


b=] 


§ 2. ADDITION AND MULTIPLICATION OF MATRICES 7 


The reader should observe that even in this special case the multiplication 
of matrices does not have the property of commutativity. For example, 


2 0 2 0|| 1 2 2 4 
= » but - ; 
3 —1 3 —1]) 3 4 


0 2 
If AB = BA, then the matrices A and B are called permutable or commuting. 


8 =2 
18 —4 


Example. The matrices 


1 2i —3 2 
4=|_4 of m™ 2=[—s | 
are permutable, because 
—7T —6 —7 —6 
4B=| Gd and BA=| : “all 


It is very easy to verify the associative property of matrix multiplication 
anc also the distributive property of multiplication with respect to addition: 

1(AB)C =A(BOC), 

2.(A + B)C=AC+ BC, 

3.4(B+C)=AB+AC. 

The definition of matrix multiplication extends in a natural way to the 
case of several factors. 


When we make use of the multiplication of rectangular matrices, we can 
write the linear transformation 


Yx = Ay Xy + Ayo%e + ore + 21,% 
Yo = Gq;Xy + Agee + ++ + Oy,%, 


eeeooeveeeeeeeeeeeesepeeeeevesvsexvneeveeeve 


Ye A 22 Gon Xo 
eeere. : ? 
Ym ' Ani ang eee ann Ly, 


i oy in abbreviated form, 
y= Ag. 


8 I. MATRICES AND MATRIX OPERATIONS 


Here 2= (2), 2o,...,2%n) and y= (iy. Yous... Ym). are column matrices 


and 4 =' ay, || is a rectangular matrix of dimension m X n. 

Let us treat the special case wires on the product C =.1B the second 
factor is a square diagonal matri, B -id:.d...... d,\. Then it follows 
from (11) that 


6;= a,¢, G1, 23054,7 9H 12 eae Di 


1.€., 
Qs, 9 G1, 1 id, 0 0 ; Ay, Ad, a,,d, 
Gey Ago .-- Aan | i 0 d,...0 \ ;Ag1d Ago ond, 
ly i, 
Any ang inn | | 0 0 d, i | a 144 Anode . Bruny | 
Similarly, 
lid, 0 O || Lay a. ay, | \ d,Q,, yy a1, | 
0 d,...0 i2g, 299 Aon __ | Teg, Ag ign doQe, 


Hence: Whena rectangular matriz A is multiplied on the right (left: 
by a diagonal matrix {d;,do,...}, then the columns (rows) of A are multi 
plied by dj, do,..., respectively. 


4. Suppose that a square matrix C= ||¢,!7 is the product of two rectan- 
gular matrices A = | ax || and B= |! by; || of dimension m X n and 2 X m, 


respectively : 


1 b 
11 1 
| O11 Cam | ay Ayg --- A, | ” ; m | 
ER ia — ‘ cece eo eee ee wees | 21 2m ; (12) 
| Cmi Cou | | Omi me ++ Amn 5 1 b 
n nm 


C=, Minds (3, 7=1, 2, es ey m) . (13) 


§ 2. ADDITION AND MULTIPLICATION OF MATRICES 9 


We shall establish the important Binet-Cauchy formula, which expresses 
determinant | C' | in terms of the minors of A and B: 


Cim a oa bya OL m 
eon eve e Ta = a rs i | we o26 eeaereet ere (14) 
fe 3 
lsh, <khec:-- Cigna 
| Cr ° Cram Oink : Omk,, b, p ae Dy am 


or, in the notation® of page 2, 


12...m\_ 1 2...m k, ky... keyg 

1 en Pa AG gE) BG oe) (14’) 
lsh <kh,<-: <kmsn, 

According to this formula the determinant of C is the sum of the products 

of all possible minors of the maximal (m-th) order® of A into the correspond- 
ing minors of the same order of B. 


Derivation of the Binet-Cauchy formula. By (13) the determinant of 
C can be represented in the form 


| n % 
Q12, 62,1 eee > a ee 
Cy eee Cim a,=1 : 1 


ry n n 


; Ama, 05,1 3 eo Amman 


ay 


(15) 
_ Q10, 50,1 eee D1 cm, Oamm 


eee tee e eee eee ee Oo 6 


If m > n, then among the numbers a, @,..., &, there are always at 
least two that are equal, so that every summand on the right-hand side of. 
(15) is zero. Hence in this case | C | =0. 

Now let m =n. Then 1h the sum on the right-hand side of (15) all those 
summands will be zero in which at least two of the subscripts a;, ao, ..., Gm 
are equal. All the remaining summands of (15) can be split into groups of 
m! terms each by combining into one group those summands that differ 
from each other only in the order of the subscripts a;, a2, ..., Gm (so that 


5 When m >, the matrices A and B do not have minors of order m. In that case 
the right-hand sides of (14) and (14’) are to be replaced by zero. 


10 _J. Matrices anp Matrix OPERATIONS 


within each such group the subscripts a), do. ..., @m have one and the same 
set of values). Now within one such group the sum of the vorresponding 
terms is® 


1 2 ...m™m 
D> €(a1, %, Seta an) A (7. k - t ) Bea bx.2 vee ed 
7 (., k, ee bccs 1. ) Se (as, Xo, vi ae Xm) 62,102," anaes Damm 
_ 4,f1 2 ...m™ k, ky ... i, 
=A(;, ky... 1) By Q 2. =) 


Hence from (15) we obtain (14’). 


Example 1. — 
| CG dy | 
| @yCy + AgCy ts Only Gydy + Aedy+ +++ + Onda | i| @1 a>. ++ Gn |) \" zs 
Byey + yey + ++> + Only Bydy + bed, + +++ + dad, ' b, by... by 
lic Cn dy { 
Therefore formula (14) vields the so-called Cauchy identity 
| aye, + Gyly + +++ + Opty @,d, + Agdy ++ ++> + Andy Sy a; ay ba | (16) 
Bycy + BaCg 2+ + Only bydy + Bgdy ++ +++ + Oady ; Os be} | ce de | 
si<ksn 
Setting a4;= ¢;,, bb =a, (4=1,2..... n) in this identity, we obtain: 
| Ft az+-+-4a2  ayby + debe +++ + yd, | a; ay |* 
Pee -+a,b, bf + OF +---+ 58 i 2. b; by " 
lsictksn 
If a; and bj (t= 1, 2,..., n) are real numbers, we deduce the well-known 
inequality 


(4, — Agby + +++ + Anby)? S (Qf +03 +++ +45) (b+ O3+--+ + OF). (17) 


Here the equality sign holds if and only if all the numbers a are propor- 
tional to the corresponding numbers b, (t*=1, 2,..., 7). 
Example 2. 


| 


' G6, -+ bid... Gln + Pads || | a, by | 


—_— 
| 


ese © © © © ©» wp © @ © © © 8B oe @ 


| 
Only + Ondy ... AnCn tOnd,, | i | 


Aan On, 


6 Here ki < k;<...< km is the normal order of the subscripts mm, as, ..., am and 
&( a1, a2, ..., am ) == (—1)”, where N is the number of transpositions of the indices 
needed to put the permutation a1, a2, ..., am into normal order. 


§ 2. ADDITION AND MULTIPLICATION OF MATRICES il 


Therefore for n > 2 
a,c, + 0,d, eon a,c, + 6,d, { 


7. @¢e¢ © © @ © © @ © © © wo © @ 


eee eee @ Se ee ee ee ee ee ee ey 
e 


Ant, + bpd, «2. Opty t+ Ondy 


Let us consider the special case where A and B are square matrices of 
one and the same order ». When we set m= 7 in (14’), we arrive at the 
well-known multiplication theorem for determinants: 


12...” 12...” 12...” 
C =A B 
12...” 12...” 12... 
or, In another notation, 
|C|=|AB\=|A]-| BI. (18) 


Thus, the determinant of the product of two square matrices ts equal to 
the product of the determinants of the factors. 


5. The Binet-Cauchy formula enables us, in the general case also, to express 
the minors of the product of two rectangular matrices in terms of the 
‘minors of the factors. Let 


A=|la,\), B= [Oi], C =llegl 


(¢=1,2,...,m; k=1,2,...,"; j=1,2,...,q) 
and 
C= AB. 


We consider an arbitrary minor of C: 


i en Stcig<ees<u<sm 
C i - # 1 bs i. ig >; pxmand p<q}). 
M1 Ja-++ Ip Sh <)e<-* <I SF 


The matrix formed from the elements of this minor is the product of 
two rectangular matrices 


eoeaeeorevreeeseaeaeevenaee 


bay, oie Onin 


12 T. Matrices anD Matrix OPERATIONS 


Therefore, by applying the Binet-Cauchy formula, we obtain :* 


ia eer + As fist k, bo ...k 

o(* j da > a(: 2 r)B( is *, (19) 
I Je +29 Dp lsh <ky c++) <kpsn k, ky ... k, nt Je oe Ps 
For p=1 formula (19) goes over into (11). For p > 1 formula (19) 


is a natural generalization of (11). 
We mention another consequence of (19). 


The rank of the product of two rectangular matrices docs not exceed the 
rank of etther factor. 
If C = AB and ra, rp, Yc are the ranks of A, B, C, then 
Yo S=min (7ra,7r)- 


§ 3. Square Matrices 


1. The square matrix of order n in which the main diagonal consists entirely 
of units and all the other elements are zero is called the unit matriz and is 
denoted by E” or simply by E. The name ‘unit matrix’ is connected with 
the following property of FE: For every rectangular matrix 


A =||a,]| (¢=1, 2,...,m; k=], 2,...,%) 
we have 
E™A= AE = A. 
Clearly 
E™ = rate 


Let A == || a, ||? be asquare matrix. Then the power of the matriz is defined 
in the usual way: 


AP=AA+++A  (p=1,2,:..); ACE. 


p times 


From the associative property of matrix multiplication it follows that 
AA! — Arts 


Here p and q are arbitrary non-negative integers. 


7 Jt follows from the Binet-Cauchy formula that the minors of order p in C for p >1n 
(if minors of such orders exist) are all zero. In that case the right-hand side of (19) is 
to be replaced by zero. See footnote 5, p.9. 


§ 3. Square Matrices 13 


We consider a polynomial (integral rational function) with coefficients 
in the field F: 
f()=at"+ a, gee ee a ae 


Then by f(A) we shall mean the matrix 
f(A)= aA” + 0, A” 4+ ee teak. 


We define in this way a polynomial in a matrix. 
Suppose that f(t) is the product of two polynomials g(t) and h(t) : 


f(t) =g(t) h(t). (21) 


The polynomial f(¢) is obtained from g(¢) and A(t) by multiplication term 
by term and collection of similar terms. In this we make use of the multi- 
plication rule for powers: #-#—=f?+?, Since all these operations remain 
valid when the scalar ¢ is replaced by the matrix A, it follows from (21) that 


f(A) =g(A)A(A). 
Hence, in particular,® 


g(A)h(A) =h(A)g(A) ; (22) 


1.e., two polynomials in one and the same matrix are always permutable. 


Examples. 


Let the sequence of elements a, for which k —i==p (1 —k=p) ina 
rectangular matrix A= | a, || be called the p-th superdiagonal (subdiago- 
mal) of the matrix. We denote by H‘” the square matrix order 1 in which 
all the elements of the first superdiagonal are units and all the other elements 
are zero. The matrix H™ will also be denoted simply by H. Then 


010... | rec 
0 

H=AM—|o °C, 7 | nr eas | | ore 
os . "0 
000... 0 re ee 


H=0 (pen). 


8 Since each of these products is equal to one and the same f(A), by virtue of the fact 
that A(t) g(t) =/f(t). It is worth mentioning that the substitution of matrices in an 
algebraic identity in several variables is not valid. The substitution of matrices that 
commute with one another, however, is allowable in this case. 


14 I. Marrices aND MatRIx OPERATIONS 
By these equations, if 
f (t)=a, + at + at? + ooo ta, sil 4--- 


is @ polynomial in f, then 


| Qa & a, an—-1 
0 Qo a, | 
Ima ta on . ‘ os | 
| ad | 
(| . 
||0 OO @ ... a, | 


Similarly, if # is the square matrix of order n in which all the elements of 
the first subdiagonal are units and all others are zero, then 


Bo O aes 0} 
oo 
aad ea ic ee aad : ; 


We leave it to the reader to verify the following properties of the matrices 
H and F: 


1. When an arbitrary rectangular matric A of dimension m XK n 18 
multirplred on the left by the matrix H (or F) of order m, then all the rows 
of A are shifted upward (or downward) by one place, the first (last) row 
of A disappears, and the tast (first) row of the product is filled by zeros. 
For example, 


01 0 GQ, Gy ay & b, by by dg 
090 1 b, by by bg i|=l]e, cg Cy cil, 
0 0 0 Cr Cy Cy % 00 0 9 
0 0 O}f |}a, a, ag a 0 0 0 90 
1 0 Oj| |}; bg by Bg]; =l]a, ay ag a, 
01 0 | Cy Cy Cy C b, by bs & 


2. When an arbitrary reciangular matrix A of dimension m X n 1s 
multiplied on the right by the matriz H (or F) of order n, then all the 
columns of A are shifted to the right (left) by one place, the last (first) 
column of A disappears, and the first last) column of the product is filled 
by zeros. For example, 


§ 3. Square Marrices 15 


@. Qs, Gs a, | p - sabe || 0 a, Gy, Gg! 

bt, bs b, by 0 0 1 0 — 0 b b. b r) 
0001 Poe? oe 

Cy Cy Cy % 000 °0 O ce Cy Cyl; 

ay a, a3 a, 0 0 0 0 G3 ag , Q, 0 

b, b, b. b, 1 0 0 0 = bo b. by 0 . 

Z 0100 . 
Cy Cy Cy C| 0010 Cp Ce % 0}; 


2. A square matrix A is called singular if | A|=0. Otherwise A is called 
non-singular. 


Let 4 = || a, ||? be a non-singular matrix (| A|540). Let us consider 
the linear transformation with coefficient matrix A ‘ 


Y; = ahs (2 —— 2, ne ty n). (23) 


When we regard (23) as equations for 21, Z2,..., Z, and observe that the 
determinant of the system of equations (23): 1s, by assumption, different 
from zero, then we can express 21, Z2,..., Z_ In terms of Y3, Yo,..., Yn bY 
means of the well-known formulas: 


Ay, oe e Qy,4~) ¥; Qs, 541 eee Q1n 
nN 
a eee a po a ° . ees e@ a ~1l 
Pa 21 ai-1 Yo %%,i41 Qn | ya! ) 


Any -2* Ang1 Yan mitts: Fn 


We have thus obtained the ‘inverse’ transformation of the transforma- 
tion (23). The coefficient matrix of this transformation 


A*=|laf, |" 


will be called the enverse matriz of A. From (24) it is easy to see that 


a if (i, k=1,2,..: , i), (25) 


where Ax; 1s the algebraic complement (the cofactor) of the element ax 
in the deterniinant | A | (1,4=1, 2,..., 7). 


16 I. Marrices anp Matrix OPERATIONS 


For example, if 
| ay Ag a! 
6, 6, 63)|and|A|~0, 


Cy Cy Cg, 


then 
i DeCa—bacg AgCg—Aylg  Agdg—Aghe ! 

Aes [A] byty—b Cg AyCg—GgC, gb, —Ayb, || . 

byCg—beCy  gCy—yCg 4b.) | 


By forming the composite transforination of the given transformation 
(23) and the inverse (24), in either order, we obtain in both cases the identity 
transformation (with the unit matrix as coefficient matrix) ; therefore 


AA =A14 —E. (26) 


The validity of equation (26) can also be established by direct multipli- 
cation of the matrices A and A~'. In fact, by (25) we have® 


: -1 I A 
[4A] y= 2’ yA; = | 4 | mtu Ap = 6, (2, f= i, 2; coy n). 
Similarly , 


n n 
o- —_ -1 ‘mts 1 —— e e_ 
[A Ala = Oa a.;= [A] & Aus _ 65; (t,7=1, 2,..., %). 


It is easy to see that the matrix equations 
AX =E and XA=E (| A | 0) (27) 


have no solutions other than Y =A7?. For by multiplying both sides of 
the first (second) equation on the left (right) by A7? and using the asso- 
clative property of matrix multiplication we obtain from (26) in both 


eases :?° » 
— A—1 


—~m 4a . 


——<—S= 


9 Here we make use of the well-known property of determinants that the sum of the 
products of the elements of an arbitrary column into the cofactors of the elements of 
that column is equal to the value of the determinant and the sum of the products of the 
elements of a column into the cofactors of the corresponding element of another column 
ig zero. 

10 Tf A is a singular matrix, then the equations (27) have no solution. For if one of 
these equations had a solution X¥ =, x, et , then we would have by the multiplication 
theorem of determinants (see formula (18)) that | 4|¢|X|—=|#|—1, and this is 
impossible when | 4 | = 0. 


§ 3. Square Matrices 17 


In the same way it can be shown that each of the matrix equations 


AX=B,XA=B_ (|A|0),. (28) 


where X and B are rectangular matrices of equal dimensions and A is a 
square matrix of appropriate order, have one and only one solution, 


X=A"B and X=BA"', (29) 


respectively. The matrices (29) are the ‘left’ and the ‘right’ quotients 
on ‘dividing’ B by A. From (28) and (29) we deduce (see p. 12) that 
fpSry andry<r;, so that ry=r,. On comparing this with (28), we 
have: 

When a rectangular matriz is multiplied on the left or on the right by 
a non-singular matriz, the rank of the original matrix remains unchanged. 


Note that (26) implies.| A |-{| A7?|=1, ice. 


1 
|4" |= 777 - 


For any two non-singular matrices we have 
(AB)i= B14-1, (30) 


3. All the matri.es of order form a ring" with unit element E”’. 

Since in this ring the operation of multiplication by a number of F is 
defined, and since there exists a basis of n? linearly indepeudent matrices in 
terms of wh.ch all the matrices of order » can be expressed linearly,’? the 
ring of matrices of order n is an algebra." 


114A ring is a collection of elements in which two operations are dcfined and can 
always be carried out uniquely: the ‘addition’ of two elemeuts (with the commutative 
and associative properties) and the ‘multiplication’ of two elements (with the associa- 
tive and distributive properties with respect to addition); moreover, the addition is 
reversible. See, for example, van der Waerden, Modern Algebra, § 14. 


12 For, an arbitrary matrix A = {] ee [9 with elements in F can be represented in 
n 
the form A = > ay Hy, where Ex is the matrix of order ” in which there is a 1 at the 


{e421 . 
intersection of the i-th row and the k-th column and all the other elements are zeros. 


13 See, for example, van der Waerden, Modern Algebra, § 17. 


18 I. Matrices AND MatrIxX OPERATIONS 


All the square matrices of orcler n form a commutative group with respect 
to the operation of addition.’ All the non-singular matrices of order n form 
a (non-commutative) group with respect to the operation of multipheation. 


A square matrix A=||a,||* is called upper triangular (lower trian- 
gular) if all ihe elements below (above) the main diagonal are zero: 
| Gy Big - - My ies 5s ol 
O- Gae ste Mee 21 my -- 0 | 
A= e- e ° ; . ; 


= 
A diagonal matrix is a special case both of an upper triangular matrix 
and 4 lower triangular matrix. 


0 0... 


Ano e . 6 Aan 


Since the determinant of a triangular matrix is equal to the product of 
its diagonal elements, a triangular (and, in particular, a diagonal) matrix 
is non-singular if and only if all jts diagonal elements are different from zero. 


It is easy to verify that the sum and the product of two diagonal (upper 
triangular, lower triangular) matrices is a diagonal (upper triangular, lower 
triangular) matrix and that the inverse of a non-singular diagonal (upper 
triangular, lower triangular) matrix is a matrix of thesame type. Therefore: 


1. All the diagonal matrices of order n form a commutative growp under 
the operation of addition, as do all the upper triangular matrices or all the 
lower triangular matrices. 


2. All the non-singular diagonal matrices form a commutative group 
under multiplication. 

3. All the non-singular upper (lower) triangular matrices form a (non- 
commutative) group under multiplication. 


4. We conclude this section with a further important operation on matrices 
—transposition. 


ee ee 


14.4 group is a set of objects in which an operation is definéd which associates with 
any two elements @ and b of the set a well-defined third element a * b of the same set 
provided that 

{) the operation has the associative property ((a * b) *c—a* (bd *c)), 

2) there exists a unit element e in the set (a *e—e * a=a), and 

3) for every element e of the set there exists an inverse element a-! (a4 * a-1=— 
a—t* ae). 

A group is called commutative, ur abelian, if the group operation has the commutative 
property. Concerning the group concept see, for example, [53], pp. 245ff. 


§ 4. Compounp Marrices. MINors 19 


If A= || ax || G@=1,2,...,m; #=1,2,...,7), then the transpose AT 
is defined as A™ = || aj, ||, where a}, =a (t= 1,2,...,m;k =1,2,...,n). 
if A is of dimension m X n, then AT is of dimension n X m. 


It is easy to verify the following properties :4° 


a) (A+ B)'=—A'+ Bl, 
b) (aA)'= aA", 

ec) (AB)'= BA’, 

a) (ATS (AT). 


If a square matrix S = || s8,||? coincides with its transpose (StT= 8), then 
it is called symmetric. In a symmetric matrix elements that are symmet- 
rically placed with respect to the main diagonal are equal. Note that the 

roduct of two symmetric matrices is not, in general, symmetric. By 3., 
this holds if and only if the two given symmetric matrices are permutable. 

If a square matrix K = ||k,, ||? differs from its transpose by a factor —1 
(Kk? = — Kk), then it is called skew-symmetric. In a skew-symmetric matrix 
any two elements that are symmetrical to the main diagonal differ from 
each other by a factor — 1 and the diagonal elements are zero. From 3. it 
follows that the product of two permutable skew-symmetric matrices is a 
symmetric matrix.*® 


§ 4. Compound Matrices. Minors of the Inverse Matrix 


1. Let A=|] an II be a given matrix. We consider all possible minors of A 
of order p (lS=psn): 
a(; "2 = (is; < tg eo sn). (31) 
k, ke... k, 4 yc ha<eee<h, 


The number of these minors is N2, where N =(5)] is the number of combina- 


tions of n objects taken p at a time. In order to arrange the minors (31) in 
a square array, we enumerate in some definite order-—lexicographic order, 
for example—all the N combinations of p indices selected from among the 
indices 1, 2,..., ”. 


35 In formulas 1., 2., 3., 4 and B are arbitrary rectangular matrices for which the 
corresponding operations are feasible. In 4., A is an arbitrary square non-singular matrix. 
16 As regards the representation of a square matrix 4 in the form of a product of two 
symmetric matrices (A = S;S8:) or two skew-symmetric matrices (4 = K,Kz2), see [357]. 


20 J. Marricks AND MatTrRIX OPERATIONS 


If the combinations of indices 4) < i2 <<... << tandky <kg<...< ky 
have the numbers a and £, then the minors (31) will also be denoted as 


follows: 
4 lg 2-23 
SA Fi, 
— ke “2 


By giving to a and f independently all the values from 1 to 'V, we obtain 
all the minors of A= || a, ||" of order p. 
The square matrix of order NV 


Up = laser 


is called the p-th compound matrix of A= ||ay||t; p can take the values 
1,2,...,7. Here %, =A, and Wf, consists of the single element | A |. 


Note. The order of enumeration of the combination of indices is fixed 
once and for all and does not depend on the choice of A. 
Example. Let 
21, Fig Ayq Ay 
A= Gay Gen Taq Ay 
ot) s,s ZY 


Bar Aeq Ugq Mey 


We enumerate all combinations of the indices 1, 2, 3, 4 taken two at a time 
by arranging them in the following order: 


(12) (13) (14) (23) (24) (34). 


(13) 403) 4(14) 4(ea) 4(06) 4Ce4 
(12) 4(13) 4(t4) 4(os) Ales) 4(o4) 
,-|“l2) 4Qvs) 4(04) 4G) 4(e4) 4004) 
(v3) 4(03) 4(72) 4G) 40) (4 
ars) 4(03) 4(14) 4G5) 463) 46) 
413) 403) 402) 4G9) 400) 469) 


§ 4. Compounp Matrices. Minors 21 


We mention some properties of compound matrices: 
1. From C= AB tt follows that ©, =U, - B,(p=1, 2,.... 0). 


For when we express the minors of me » (1S psn) of the matrix 
product C, by formula (19), in terms of the minors of the same order of the - 
factors, then we have: 


te be. Healt oe ee La dy awl 
a o(? 2 "|= = a g++: "\ al 2 "| 
Le Beeh sie Se Teco) Ve Phcectt 


p 
hy ON <oeere<yz 
ls; 7 ” <n). 


< 32 
hy hy <r ky Se 


Obviously, in the notatiun of this section, equation (382) can be written 
as follows: 


N 
Cag = 3" daabag (6, B= 1,23 5025-2) 


A=1 


(here a, 8, and 4 are the numbers of the combinations of indices 
t$<ip<... <i; Ach... <k3 A<h<...<l,). Hence 


C, =4,8, (p =1, 2,...,). 


2. From B= A-' it follows that 8,= U," (p= 1, 2,...,n). 
This result follows immediately from the preceding one when we set 
a , Be ad ? 
C= £ and bvar in mind that @, is the unit matrix of order No = & 
From 2. ‘) ere follows an important formula that expresses the minors 
of the inverse matrix in terms of the minors of the given matrix: 


— a | ‘ Ep Eee r <_ AS <hy 
If B=A~', then for arbitrary (1 Se es Sn) 


2 , 
Dt Shy ky ke. ks 
; (— y= Pee a( 7 ie 
ek . at tye -+ ty Lp 
BR BN hee ie ea , (33) 
aa k, _k A 1 Oot n 
P 12 n 


where t1< tg <C-e8 < tp und a < t, <seed aa form a complete system of 
indices 1, 2,...,7, a> do ky Ckg<i-++<k, and hg <ee eck. 
For it follows from AB =F that 


UB —E, 


22 I. Matrices anp MATRIX OPERATIONS 


or in more explicit form: 


a {1 (vy=B6), 34 


a=al 


Equations (34) can also be written as follows: 


p 
o.e . ° . I, if ly — )? =9, 
Jide -++Iy f4ytg 24, ‘ a4 
; A Pome 's Bi, L k = Pp 
LSt<tp<---cipgn \ty tg --- ty hee p 0, if S'j—k)2?>0 
y=l1 
(34’) 
<n... <j 
fice I2 Ip <n), 
ky <hy<...<k, 


On the other hand, when we apply the well-known Laplace expansior: 
to the determinant | A |, we obtain 


P 


¥ g : , ’ ad 
Perey 4 ee er +24 a(t ee oP 
r 1 


A 
lgt<<--<ipsn \t, t,...1 J eee ee 


Pp 
|A|, if 2) (,—%)?=0, 
thea (35) 


P 
| 0, if jv — ky)? >0, 
y=} 


where i; <i,<--+<i, and 1; <i,<-+-<ia,_, form a complete system of 
indices 1,2,...,”, as do ky <kg<-++- <k, and ki <ky<+++<k_,. Com- 
parison of (35) with (34’) and (34) shows that the equations (34) are 


satisfied if we take together with bag not B ( a i) anes 
phy... ky 


P ? 
t + ky id 
ner ya A Kk... Kap 
45% eee tp 


12...” 
a ( | ere i 
Since the elements bag of the inverse matrix of Y, are uniquely deter 
mined by (34), equation (33) must hold. 


CHAPTER II 


THE ALGORITHM OF GAUSS AND SOME OF ITS APPLICATIONS 


§ 1. Gauss’s Elimination Method 


1. Let 
Uy, %y 1 Ayglg Tes * + AynT, = Yy 
Agy% + Boo%e + °** + AgnLy = Yo (1) 
By Xy tr Ane i i “—- Agn®n = Yn 
be a system of 7 linear equations in n unknowns 2), t2,..., 2, With right- 
hand sides 1, Yo,..., Yn- 
In matrix form this system may be written as 
Ar=y. (1’) 
Here x= (%,2%2,...,%n) and y=(Y¥Y), Y2,-..,Yn) are columns and 
A =||ay|\7 is the square coefficient matrix. 
If A is non-singular, then we can rewrite this as 
a= A-ty, (2) 
or in explicit form: 
,= Sal y, (¢=1, 2, ..., m). (2’) 


Thus, the task of computing the elements of the inverse matrix 
A-! —|laj," ||? is equivalent to the task of solving the system of equations 
(1) for arbitrary right-hand sides y;, y2,..., Yn. The elements of the inverse 
matrix are determined by the formulas (25) of Chapter I. However, the 
actual computation of the elements of A~' by these formulas is verv tcaivus 
for large n. Therefore, effective methods of computing the -iements of an 
inverse matrix—and hence of solving a system of linear equations—are of 
great practical value.’ 


1¥For a detailed account of tnese methods, we refer the reader to the book by Faddeev 
[15] and the group of papers that appeared in Uspehi Mat. Nauk, Vol. 5, 3 (1950). 


23 


24 II. Tae ALGORITHM OF GAUSS AND SOME APPLICATIONS 


In the present chapter we expound the theoretical basis of some of these 
methods; they are variants of Gauss’s elimination method, whose acquaint- 
ance the reader first made in his algebra course at school. 


2. Suppose that in the system of equations (1) we have a,,;+0. We 
eliminate z, from all the equations beginning with the second by adding 


to the second ae the first multipled by — ae to the third the first 
multiplied by — —", and so on. The system (1) has now been replaced by 
the equivalent a 
yy Ty + Ayo%y + oes +21, T= Y 
Apgity + +++ +O, t,= YE 


e ee 8 @ ee 6 &© @  @ ee © e@  «@ 


1 1 1 
Anat, + +++ + On ty =H 


(3) 


The coefficients of the unknowns and the constant terms of the last » — 1 
equations are given by the formulas 


(1) G51 (1) G51 ee 
Diy Sy — Fy Y; =U 5 M1 (t,97=2,..., n). (3’) 


Suppose that a{)+40. Then we eliminate :r2 in the same way from the 
last n — 2 equations of the system (3) and obtain the system 


Qy%y + Ay_%q + Ayg%y + *°* + 4,,%, = Yy 


Oooh + Ong tyt +++ +a5,x, = yl 
Gsg%qt +++ +aglz,= y (4) 


eo e ee © 8 © © © @ @ © e© oe «6 


gta t +++ + Olt =p 


The new coefficients and the new right-hand sides are connected with the 
preceding ones by the formulas: 


atl) a) 
a) — (1) (1) ‘i qQ) *2 0) a 


Continuing the algorithm, we go in » — 1 steps from the original system 
(1) to the triangular recurrent system 


241X1 + Ayo%, + “19% test +4,,%, =Yy 
(1) (1), (1) 
Ayo La + Ay: 3%_ tes + agee n Ye 
(2) (2), — ,A2) 
As, Qe ++°++43,2 n — Y3 (6) 
(n-1) (m—1) 


ang a, an Yn ° 


§ 1.: Gauss’s Eximination MetHop 25 


This reduction can be carried out if and only if in the process all the 
e -2 : . 
numbers 4), ays, A$3,-..,@n1en-1 turn out to be different from zero. 
This algorithm of Gauss consists of operations of a simple type such as 


can easily be carried out by present-day computing machines. 


3. Let us express the coefficients and the right-hand sides of the reduced 
system in gees of the coefficients and the right-hand sides of the original 


system (1). hall not pou here that in the reduction process all the 
numbers 4, aby, No ++) y-1 turn out to be different from zero; we 


consider the general case, in which the first p of these numbers are different . 
from zero: 


M40, a0, ..., a0 © (pSn—l). (7) 


This enables us (at the p-th step of the reduction) to put the original system 
of equations into the form 


241% -+- Aya%o + eopee @¢€ @ 8 eo oee ee @ ry + LN —_— Yi 
(1) | Qo ) 
Qeo%o ef “eee ee v8 @ o¢7e ee & @ + A,X = Yo 
(P-1) (p-1) (p-1) (8) 
a Zz = oe © © ee @© © © © © @ + a x —_— y 
= fo) ag), 


(P) (P) 
Qn, pti p+ +re* +4,2%, =Y, 


We denote the coefficient matrix of this system of equations by G,: 


Gy, A Ar» Ay, n+ Ai» 
(1) (1) (1J (1) 
0 a Gop Qe, o+1 Don 
= (p-1) (p-1) (p-1) 
G=|0 O ...4a, oe 2 Gy : (9) 
p (P) 
0 0 ...90 Gottepet - ° * Optin 
0 0 ...0 aM, ai” 


The transition from A to G, is effected as follows: To every row of A 
in succession from the second to the n-th there are added some preceding 
rows (from the first p) multiplied by certain factors. ‘Therefore all the 
minors of order A contained in the first h rows of A and G, are equal: 


A(, 2 ti) f 2 a a ee 


kk...&) "hk... k h=1,2,..., 2) ) (10) 


26 II]. Tue ALGORITUM OF GAUSS AND SOME APPLICATIONS 


From these formulas we find, by taking into account the structure (9) 
of G,, 


12...) q) (p-1) 
4, ) | =a °° Gp > , (11) 
‘4 lo Beams = ay yf a... g?-Yg) (1, k=p+l nm), (12) 
} 2 pk 122 Pp tk 3 pete y >. 


When we divide the second of these equeuons by the first, we obtain the 
fundamental formulas? 


PE i, k=ptl,..., n). (13) 


If the conditions (7) hold for a given value of p, then they also hold for 
every smaller value of p. Therefore the formulas (13) are valid not only 
for the given value of p but also for all smaller values of p. The same 
holds true of (11). Hence instead of this formula we can write the 
equations 


1 1 2 1) 123 1 
A()=an, al, 5) =anal al) , 3)= a,,asva™, wee. (14) 


Thus, the conditions (7), 1.e., the necessary and sufficient conditions for the 
feasibility of the first p steps in Gauss’s algorithm, can be written in the 
form of the following inequalities : 


A( 1} #0, Al, AO = A(r ge |0 (18) 
From (14) we then find: 
1 123 12...9 
ee w»_4(r3) (2) A(, 2 a) ~1) (1 2 
mena ea) a ag arene 


In order to eliminate 2, ro,..., Zp consecutively by Gauss’s algorithm 


it is necessary that all the values (16) should be different from zero, i.e., 


that the inequalities (15) should hold. However, the formulas for ai?) make 


sense if only the last of the conditions (15). holds. 


2 See [181], p. 89. 


§ 1. Gauss’s ELimination METHOD 27 


4. Suppose the coefficient matrix of the system of equations (1) to be of 
rank r. Then, by a suitable permutation of the equations and a renumber- 
ing of the unknowns, we can arrange that the following inequalities hold: 


: A(r 3 eo Gj =1,2 17 
12...3 ee (17) 
This enables us to eliminate 2, to...., Z, consecutively and to obtain the 


system of equations 


(1) 1) (1) 
LT + eeeee2etet @ eeees ee +4 ae —w 3 
—1 =| —1 
a’ Ye “+ ¢*eeeee#ee#@ + af” ) Ln — y ) ( l 8) 


(r) (r) (r) 
Det. 1% 41 so ors of ON Ea Yrit 


oe e¢ @ @ @ @ © @ @ @ @ @®© @ @© © © © @ @ @ @ 


(r) i 
Aaertitryy tee? a BnnX_ = Yn . 


Here the coefficients are determined by the formulas (13). From these 
formulas it follows, because the rank of the matrix A =|] a,, ||" is equal to r, 
that 


a®=0 (,k=—r+l,...,n). (19) 
Therefore the last n — r equations (18) reduce to the consistency conditions 
y¥=0 (s=r+1,...,n). (20) 


Note that in the elimination algorithm the column of constant terms is 
subjected to the same transformations as the other columns, of coefficients. 
Therefore, by supplementing the matrix A =||a, ||? with an (n‘+1)-th 
column of the constant terms we obtain: 


A(; p 1) 
yP— ae MEY G12... ms p=1,2,...,7). © (21) 


In particular, the consistency conditions (20) reduce to the well-known 
equations 


a(n =o G=1,2 | . 29) 
l...r n+l — ee aL ( 


28 II. THe ALGORITHM OF GAUSS AND SOME APPLICATIONS 


If n=r, ie. if the matrix A =||a, || is non-singular, and 


12...9 
A 30. j=], 2,..., 2), 
( 2 vt) Q 

then we can eliminate 2), Z2,..., Zn»—1 1M succession by means of Gauss’s 
algorithm and reduce the system of equations to the form (6). 


§ 2. Mechanical Interpretation of Gauss’s Algorithm 


1. We consider an arbitrary elastic statical system S supported on edges 
(for example, a string, a rod, a multispan rod, a membrane, a lamina, or a 
discrete syStem) and choose n points (1), (2),..., (nm) on it. We shall 
consider the displacements (sags) yi, yo,....Yn of the points (1), (2),... 

(n) of S under the action of forces Fy, F2,..., F, applied at these points. 


(t) 


Fig. 2 


We assume that the forces and the displacements are parallel to one and the 
same direction and are. determined, therefore, by their algebraic magnitudes 
(Fig. 1). Moreover, we assume the principle of linear superposition of 
forces: 


1.. Under the combined action of two systems of forces the corresponding 
displacements are addéd together. 
2. When the magnitudes of all the forces are multtplred by one and the 


same real number, then all the displacements are multiplied by the same 
number. 


§ 2. MECHANICAL INTERPRETATION OF GAUSS’s ALGORITHM 29 


We denote by ay, the coefficient of influence of the point (k) on the 
point (7), i.e., the displacement of (7) under the action of a unit force 
applied at (k) (4,k=1,2,...,”) (Fig. 2). Then under the combined action 
of the forces F,, Fo,..., F, the displacements y,, yo,..., Yn are determined 
bythe formulas 


2 P= Yi (¢=1,2,...,n). (23) 


Comparing (23) with the original system (1), we can interpret the task 
of solving the system of equations (1) as follows: 


The displacements y1, Y2,.-., Yn being given, we are required to find the 
corresponding forces F,, Fe,..., Fn. 

We denote by S, the statical system that is obtained from S by introduc- 
ing p fixed hinged supports at the points (1), (2),..., (p) (pn). We 
denote the coefficients of influence for the remaining movable points 


(p+1),..., (n) of the system S, by 
ae = (i, K=p+,...,m) 


(see Fig. 3 for p=1). 


Fig. 3 


The coefficient ai, ean be regarded as the displacement at the point (1) 
of § under the action of a unit force at (4) and of the reactions It,, Re,..., Ry 
at the fixed points (1), (2),...,(p). Therefore 


a?) — Ryayy + +++ + Rodin + Oe. (24) 
On the other hand, under the same forces the displacements of the 


system § at the points (1), (2),..., (p) are zero: 


(25) 


Cr J 


Ryay +--+ + By + 44% = 9 


Rytpi +++: + Ryan + Oo~ = 9. 


30 II. THe ALGORITHM OF GAUSS ANN SoME APPLICATIONS 


If ; 
Cae 
A( P\ 40, 
2 ee 
then we can determine R,, Ro,...,R, from (25) and substitute the expres- 
sions so obtained in (24). This elimination of R,, Ry, ..., R, can be carried 


out as follows. To the system of equations (25) we adjoin (24) written 
in the form 


Ray, + +++ + Ryagp + ay — al?) =0, (24’) 
Regarding (25) and (24’) as a system of p + 1 homogeneous equations 


with non-zero solutions Ry, Re,..., Rp, Rp41=1, we see that the determinant 
of the system must be zero: 


Gy, Gn AE 
2p) Tpp Ape =" 
a4 Qip Oe— a?) 
Hence 
A ( 2...) ) 
af?) = VA PU (i, k=p+l,...,n). (26) 


These formulas express the coefficients of influence of the ‘support’ system 
S, in terms of those of the original system S. 

But formulas (26) coincide with formulas (13) of the preceding section. 
Therefore for every p (Sn—1) the coefficrents on (4,k=pt+l1,...,n) 
in the algorithm of Gauss are the coefficients of influence of the support 
system Sp. 

The truth of this fundamental proposition can also be ascertained by 
purely mechanical considerations without recourse to the algebraic deriva- 
tion of formulas (13). For this purpose we consider, to begin with, the 
special case of a single support: p= 1 (Fig.3). In this case, the coefficients 
of influence of the system S; are given by the formulas (we put p=1 
in (26)): 


|) 
Ari) | 
ae NE eg (i, k= 1, 2 -.-5 2) 


These formulas coincide with the formulas (3’). 


§ 3. SYLVESTER’s DEvERMINANT IDENTITY 31 


Thus, if the coefficients a, (1,k =1, 2,..., 7) in the system of equations 
(1) are the coefficients of influence of the.statical system S, then the coeffi- 
cients al; (i,k =2,...,n) in Gauss’s algorithm are the coefficients of in- 
fluence of the system S;. Applying the same reasoning to the system 9S; 
and introducing a second support at the point (2) in this system, we see that 
the coefficients a‘ (1,4 =3,...,m) in the system of equations (4) are the 
coefficients of influence of the support system S2 and, in general, for every 
p (S=n— 1) the coefficients a) (1,k=p+1,...,7) in Gauss’s algorithm 
are the coefficients of influence of the support system S>. 


From mechanical considerations it is clear that the successive introduc- 
tion of p supports is equivalent to the simultaneous introduction of these 
supports. 


Note. We wish to point, out that in the mechanical interpretation of the 
elimination algorithm it was not necessary to assume that the points at 
which the displacements are investigated coincide with the points at which 
the forces F;, Fo,...,F, are applied. We can assume that yj, yo,..., Yn 
are the displacements of the points (1), (2), ..., (m) and that the forces 
F,, Fo,..., F, are applied at the points (1’), (2’),..., (n’). Then ay is 
the coefficient of influence of the point (k’) on the point (k). In that case 
we must consider instead of the support at the.point (j) a generalized sup- 
port at the points (j), (#’) under which the displacement at the point (j)_ 
iS maintained all the time equal to zero at the expense of a suitably chosen 
auxiliary force R; at the point (j’). The conditions that allow us to intro- 
duce p generalized supports at the points (1), (1’) ; (2), (2’),...; (p), (v’), 
i.e., that allow us to satisfy the conditions y; = 0, y2=0,..., y,=0 for arbi- 
trary F,11,...,, at the expense of suitable hj = F;,...,R, =F, can be 
expressed by the inequality 


| ae See 
( "lo. 
.—p 


§ 3. Sylvester’s Determinant Identity 


I. In §1, a comparison of the matrices A and G, led to equations (10) 
and (11). 

These equations enable us to give an easy proof of the important deter- 
minant identity of Sylvester. For from (10) and (11) we find: 


1 2 12 a pti e Opti 

~~. e soe @ sd 

|4|=4 ( \=a/( ") ea ra, 
] 9...%Nn l 2... (Pp) (P) 

Gnp+i °° * Gan 


39 I]. THe AL@orirHm or Gauss AND SOME APPLICATIONS 


pee ) by the determinants 


We introduce borderings of the minor A(, 2... 


12...p8 
b..= A : , k= jeg : 
tk ( nee (i,k p+, , 0) 


The matrix formed from these determinants will be denoted by 


B=||bu |b 41- 
Then by formulas (13) 


bp+1,p+1 er eee bp+1,9 | 


al?) eal ag Ge serds ar igi fee gia 
‘ an ba,p+1 oe Onn se | B] ; 
= 12... p\}"-? 12... p\jn-p 
ao | Ga Ge 3) 
12... p 12... 
af? . a 
Therefore equation (27) can be rewritten as follows: 
1 oc. p\ |e 
B\=|A A}. 28 
ai=[a( ei ia (28) 


This is Sylvester’s determinant identity. It expresses the ¢eterminant | B| 
formed from the bordered determinants in terms of the original determinant 
and the bordered minor. 


We have established equation (28) for a matrix A, a - 9) whose ele- 
ments satisfy the inequalities 
| oe ee 
A 0 29 
( ee ‘\ a 


(j =I, 2, sao PDs 


However, we can show by a ‘continuity argument’ that this restriction 
may be removed and that Sylvester’s identity holds for an arbitrary matrix 
A = ||a,,|[%— For suppose that the inequalities (29) do not hold. We intro- 
duce the matrix 

A,—A-+EeEz. 


Obviously lim A,= A. On the other hand, the minors 


6-0 
| eae 
A, Neate. 
| a -aerae ) 


(9 =1, 2,..., p) 


§ 4. DECOMPOSITION OF SQUARE MatTRIX INTO TRIANGULAR Factors 33 


are p polynomials in ¢ that do not vanish identically.. Therefore we can 
choose a sequence é,— 0 such that 


We can write down the identity (28) for the matrices A,,. Taking the 
limit m— oo on both sides of this identity, we obtain Sylvester’s identity 
for the limit matrix’ A= lim A, . | 


™%-? oO 


If we apply the identity (28) to the determinant 
Bae meet ( ees) 
n 
12...p k, ky... k, hy < ky <eee<k, 


then we obtain a form of Sylvester’s identity particularly convenient for 
applications 


a(} ty i =laG al, 2...p t, ty a (30) 
k, ky... k, 12...p 12...p k, k...k, 


§ 4. The Decomposition of a Square Matrix into Triangular Factors 


1. Let A= ||aq||2 be agiven matrix of rank r. We introduce the follow- 
ing notation for the successive principal minors of the matrix 


12...% 


ae (k= 1, 2,..., 7). 


Let us assume that the conditions for the feasibility of Gauss’s algorithm 


are satisfied: 
D,~0 (R= 1,2, 103, 7): 


We denote by @ the coefficient matrix of the system of equations (18) 
to which the system 


n 
a Me= Hi ($=1, 2,...,n) 


8 By the limit (for p> 0) of a sequence of matrices Xp || a? |) we mean the 


matrix X= ||ze[lt , where su lim zy’ (,k—=1,2,...,). 


34 Il. Tee ALGORITHM OF GAUSS AND SOME APPLICATIONS 


has been reduced by the elimination method of Gauss. . The matrix @ is of 
upper triangular form and the elements of its first r rows are determined 
by the formulas (13), while the elements of the last n -— r rows arc all equal 
to zero :* 

| Ayy  Byq Bp p41 + + + Uy | 


(1) (1) (1) (1) 
Se Sa es i 


a=0 0 aM al) 2. at? 


rr Tearti:*°* “pn 


cre fe @ o 6 ee © « @ 8 € &@ © we © © © Pe le 


The transition from A to G is effected by a certain number N of opera- 
tions of the following type: to the 1-th row of the matrix we add the j-th 
row (j <1), after a preliminary multiplication by some number a. Such 
an operation is equivalent to the multiplication on the left of the matrix to 
be transformed by the matrix 


(7) (*) 
E igs Onsen Os 203-0 


(31) 


10... O.. O 1 


In this matrix the main diagonal consists entirely of units, and all the 
remaining elements, except a. are zero. 
Thus, 
G=W,-:>W,W,A, 


where each matrix W,, Wo,..., Wy is of the form (31) and is therefore a 
lower triangular matrix with diagonal elements equa) to 1. 


4 See formulas (19). G cvinades with the, matrix G, (p. 25) for p =r. 


§ 4. DECOMPOSITION OF SQUARE MaTRIX INTO TRIANGULAR FACTORS 35 


Let. be ; 

W=Wy,--: WLW,. (32) 
Then 
a + G= WA. (33) 


We shall call W the transforming matrix for A in Gauss’s elimination 
method. Both matrices G and W are uniquely determined by A. From 
(32) it follows that W is lower triangular with diagonal elements equal to 1. 

Since W is non-singular, we obtain from (33): 


A=WG. (33’) 


We have thus represented A in the form of a product of a lower triangular 
matrix W~-' and an upper triangular matrix G. The problem of decom- 


posing a matrix A into factors of this type is completely answered by the 
following theorem: 


THEOREM 1: Every matrix A= |lag\||% of rank r in which the first r 
successive principal minors are different from zero 


,)*0 for k=1,2...,9r (34) 


can be represented in the form of a product of a lower triangular matrix B 
and an upper triangular matrix C 


b, 0 ...0 C1y Cyn «es Cap 
A=BC= bey bee oes 0 0 , Cop eee Con ; (35) 
— bur Ono... 5, |] [pO 0 ... &, 
Here 
D D, 
Oily = Dy, bg9¢29 = Dy’ vee One, = De (36) 


The values of the first r diagonal elements of B and C can be chosen 
arbitrarily subject to the conditions (36). 

When the first-r diagonal elements of B and C are given, then the ele- 
ments of the first r rows of B and of the first r columns of C are uniquely 
determined, and are given by the following formulas: 


12...k—lg All 2... k—-LE 
a) er oe 
wee? ee ROD 
(2 112%) (12 i) 


(g =k, k+1, oeey nh, k=1, 2, © oy r). 


36 II. Tue ALGORITIUM OF GAUSS AND SOME APPLICATIONS 


Ifr<n (|A|=0), then all the elements in the last n—r rows of B 
can be put equal to zero and all the elements of the last n —r columns of C 
can be chosen arbitrarily; or, conversely, the last n—r rows of C can be 
filled with zeros and the last n — r rows of B can be chosen arbitrarily. 

Proof. That a representation of a matrix satisfying conditions (34) can 
be given in the form of a product (35) has been proved above (see (33) ). 

Now let B and C be arbitrary lower and upper triangular matrices whose 
product is A. Making use of the formulas for the minors of the product of 
two matrices we find: 


12...k—1 | ree | oe 
4\ 1 eee er sz) 
12...k-l1k a,<a,c.-cag \%y Sy... Spi & ] 2 ccc k 
(g=k, k+1, ...,n; k=1, 2, ...,1). 


Since C is an upper triangular matrix, the first k columns of C contain only 


one non-vanishing minor of order k, namely C (; 7 oe i) . Therefore, equa- 


tion (38) can be written as follows: 
12...k—lg 12...k—lg 12...k 
A =B a 
be ae Seaeis  opaaeet 
= byibog +++ Oya e sO geCarogn °° * Ce (39) 


(g=k, k+1, ...,”; k=1, 2, eee r). 
We put g=k in this equation, obtaining 


Dy ybog + + + OyyCyyCog * * * Sp = Dy (k=1,2,....7), (40) 


and relations (36) follow. 

Without violating equation (35) we may multiply the matrix B in that 
equation on the right by an arbitrary non-singular diagonal matrix 
M =\|\4,5y\|", while multiplying C at the same time on the left by 
M>=|| u;16,||%. But this is equivalent to multiplying the columns of B by 
Hy, Ma, --+> Hy, respectively, and the rows of C by py", Me,.-.-, Wa’. We may 
therefore give arbitrary values to the diagonal elements 611, bo2,..., by and 
C11, C22, ..., Crr, provided they satisfy (36). 

Further, from (39) and (40) we find: 


12...k—1 
bon One ar 2 s Hy g 


i.e., the first formulas in (37). The second formulas in (37), for the ele- 
ments of C, are established similarly. 


=—k,k+1, soon k= 1,2, sogen); 


§ 4. DECOMPOSITION oF SQUARE MatRIX INTO TRIANGULAR Factors 37 


We observe that in the multiplication of B and C the elements b,, of the 
last » — r columns of B and the elements c,; of the last n — r rows of C are 
multiplied only among each other. We have seen that all the elements of 
the last n — r rows of C may be chosen to be zero.’ But as a consequence, 
the elements of the last »—r columns of B may be chosen arbitrarily. 
Clearly the product of B and C does not change if we choose the last n —r 
columns of B to be zeros and choose the elements of the last » — r rows of 
C arbitrarily. 

This completes the proof of the theorem. 


From this theorem there follow a number of interesting corollaries. 


CoroL~uaRy 1: The elements of the first r columns of B and the first r 
rows of C are connected with the elements of A by the recurrence relations 


k—1 
Ain— @ bigty 
6; = —_=*__ ((2k;+=1, 2, fea nhs k= 1,2, oeey r), 
e Crk 41) 
i-1 ( 
— V8, 
My had iff ° .2 ; k=] 
ora Sea (Qk; t=1,2, ..., 7; K=1,2,..., n). 


The relations (41) follow immediately from the matrix equation (35) ; they 
can be used to advantage in the actual computation of the elements of B 
and C. 


Corotuary 2: If A = ||ay||? is a non-singular matriz (r=n) satisfying 
(84), then the matrices B and C in the representation (35) are uniquely 
determined as soon as the diagonal elements of these matrices are chosen in 
accordance with (36). 


Corotuary 3: If S = |/8,||% is a symmetric matrix of rank r and 


D,= 8  i)xo (E=1, 2, ...,9%); 


| eee 
then 

S=BB’, 
where B=||6,||" ts a lower triangular matrix in which 


5 This follows from the representation (33’). Here, as we have shown already, arbi- 
trary values may be given to the diagonal elements b,,, - + -» Orr, Ciry - « «+ Cre provided (36) 


is satisfied by the introduction of suitable factors #1 a» +--+» Mr - 


38 II. Tue ALGORITHM OF GAUSS AND SOME APPLICATIONS 


1 P amen | id ae 
ee eer —&k, greey hy Fy By os ag Thy 
b= VD,Dz_ 1 \) Q9...k—lk 7 (42) 
0 (g=—k,k+1,....n;k=r+],...,n). 


2. In the representation (35) let the elements of the last n —r columns 
of C be zero. Then we may set 


where F and L are upper and lower triangular matrices respectively ; the 
first r diagonal elements of F and L are 1 and the elements of the last » —r 
columns of F and the last n —r rows of L can be chosen completely arbi- 
trarily. Substituting (43) for B and C im (35) and using (36), we obtain 
the following theorem: 


THEOREM 2: Every matrix A =|la,||" of rank rin which 
l 2 eee k 
D,=A 0 Of ES) 2c et 
. ( 2... 7 f 


can be represented in the form of a product of a lower triangular matriz F, 
a diagonal matrix D, and an upper triangular matrir L: 


1.-0..:90 D, 1 a Sareea Se 
D, 
D, 
fa 1 .0 * OOD ej chey 
gee Se D,_, 4, test Aas 
Fis hignae) oO] }oo...1 
where 
12...k—lg 12...k—1k 
A(; eee 4 : AG eam! 
hoe = ati ye, rs | ne ; (45) 
Ge oo AG 
(g=k+1,...,n;k=1, 2, Fy 


§ 4. DECOMPOSITION OF SQUARE MarTRix INTO TRIANGULAR Factors 39 


3. The elimination method of Gauss, when applied to a matrix A =|la,||? 
of rank r for which D,; #0 (k=1,2,...,7), yields two matrices: a lower 
triangular matrix W with diagonal elements 1 and an upper triangular 


: ‘ : ; : , D 
matrix G in which the first r diagonal elements aré D,, D, ang De and the 


last » —r rows consist entirely of zeros. G is the Gaussian form of the 
matrix A; W is the transforming matrix. 

For actual computation of the elements of W we recommend the follow- 
ing device. 

We obtain the matrix W when we apply to the unit matrix E all the 
transformations (given by W,,...,Wwy») that we have performed on A in 
the algorithm of Gauss (in this case we shall have instead of the product WA, 
equal to G, the product WE, equal to W). Let us, therefore, write the unit 
matrix # on the right of A: 


| 4, eo ee ay, l eoie 0 


(46) 


ni eo 8 @ Aan 0 l 


By applying all the transformations of the algorithm of Gauss to this 
rectangular matrix we vbtain a rectangular matrix consisting of the two 
square matrices @ and W: 

(G,W). 


Thus, the application of Gauss’s algorithm to the matrix (46) gives the 
matrices G and W simultaneously. 

If A is non-singular, so that | A| +0, then | G@| +0 as well. In this 
case, (33) implies that A~1=G7'W. Since G and W are determined by 
means of the algorithm of Gauss, the task of finding the inverse matrix A? 
reduces to determining G~! and multiplying G—' by W. 

Although there is no difficulty in finding the inverse matrix G~—? once 
the matrix @ has been determined, because G js triangular, the operations 
involved can nevertheless be avoided. For this purpose we introduce, to- 
gether with the matrices G and W, similar matrices G,; and W, for the 
transposed matrix A’. Then A'= W,—'G,, ie., 


A='WTA, (47) 
Let us compare (33’) with (44) : 
A=W-'G, A=FDL. 


40 II. THe ALGorRITHM oF Gauss AND SOME APPLICATIONS 


These equations may be regarded as two distinct decompositions of the form 
(35) ; here we take the product DL as the second factor C. Since the first r 
diagonal elements of the first’ factors are the same (they are equal to 1), 
their first r columns coincide. But then, since the last n —r columns of F 
may be chosen arbitrarily, we chose them such that 


F—W-!, (48) 
On the other hand, a comparison of (47) with (44), 
A=@'W!", A=FDL, 


shows that we may also select the arbitrary elements of Z in such a way that 


L=W'+, (49) 
Replacing F and L in (44) by their expressions (48) and (49), we obtain 
A=W-pWw-, (50) 


Comparing this equation with (33’) and (47) we find: 
G=DWT", Gr=W"D. (51) 


We now introduce the diagonal matrix 


res ae 
D=|5 on eet 0. .., Of. (52) 
Since 
D=DDD, 
it follows from (50) and (51) that 
A=@'™DG. (53) 


Formula (53) shows that the decomposition of A into triangular factors 
can be obtained by applying the algorithm of Gauss to the matrices A and A’. 


Now let A be non-singular (r=n). Then |D|40,D=D-'. There- 
fore it follows from (50) that 


A+ =WDW. (54) 


This formula yields an effective computation of the inverse matrix A~' by 
the application of Gauss’s algorithm to the rectangular matrices 


(A,B) (A',£). 


§ 5. PAarTITIONED MaTRICES. GENERALIZED ALGORITHM OF Gauss 41 


If, in particular, we take as our A a symmetrical matrix S, then G, 
coincides with G and W, with W, and therefore formulas (53) and (54) 
assume the form 


¥ 


§=G@™DG, (55) 


si= W'DW. (58) 


§ 5. The Partition of a Matrix into Blocks. The Technique of Operat- 
ing with Partitioned Matrices. The Generalized Algorithm of Gauss 


It often becomes necessary to use matrices that are partitioned into rec- 
tangular parts—‘cells’. or ‘blocks.’ In the present section we deal with such 
partitioned matrices. 


1. Let a rectangular matrix 


A=||ag||  (§=1,2,...,m; k=1,2,...,) (57) 


be given. 
By means of horizontal and vertical lines we dissect A into rectangular 
blocks : 
My Mg 
An Ais eee A,, } My, 


(58) 


eoee#se« ee 8 e #® «© @© @ 


We shall say of matrix (58) that it is partitioned into st blocks, or 
‘cells Agg of dimensions mg X ng (a = 1,2,...,8; B=1,2,...,¢), or that it 
is represented in the form of a partitioned, or blocked, matrix. Instead of 
(58) we shall simply write 


A=(Ag) (a#=1,2,...,8 B=1,2,...,#). (59) 


In the case s =? we shall use the following notation : 


A= (Acs); , (60) 


42 II. Tre ALGorRItTHM OF GAUSS AND SOME APPLICATIONS 


Operations on partitioned matrices are performed according to the same 
formal rules as in the case in which we have numerical elements instead of 
blocks. For example, let A and B be two rectangular matrices of ecual 
dimensions partitioned into blocks in exactly the same way: 


A =(Aag), B==(Bag) (a= 1, 2,...,8; B=1,2,...,8 (61) 
It is easy to verify that 
A+ B= (Ags + Bag) (a=1, 2,...,6; B=1, 2,..., t). (62) 


We have to consider multiplication of partitione:! matrices in more 
detail. We know (see Chapter I, p. 6) that for th. rmiultiplication of two 
rectangular matrices A and B the length of the rows of the first factor A 
must be the same as the height of the columns of the second factor B. For 
‘block’ multiplication of these matrices we require, in addition, that the 
partitioning into blocks be such that the horizontal dimensions in the first 
factor are the same as the corresponding vertical dimensions in the second: 


Ay As. mers A;, }m, By By» goats Bi }n, 
A= Ag, Aso bree A,, as . B ez Boy Boe ead Buy ae (63) 
A, Ao ‘ ig A,, } ™m, Bi B,» eoce B, } n, 


Then it is easy to verify that 


: emt 
AB=C =(Cos), where Cag = 3* Acs Bog . (64) 
é=1 BST 2 ee 


We mention separately the special case in which one of the factors is a 
quast-diagonal matrix. Let A be quasi-diagonal, 1¢., let s=¢ and Agg=O 
fora. In this case formula (64) gives 


Cup= AcaBag (Gea), 2 io 8) Ba 1 ona tl) (65) 


When a partitioned matrix 1s multtplied on the left by a quast-diagonal 
matrix, then the rows of the matrix are multiplied on the left by the corres- 
ponding diagonal blocks of the quasi-diagonal matriz. 

_ Now let B be a quasi-diagona! matrix, 1.¢., let t= u and Bag =O fora $B. 
hen we obtain from (64): 


Cap = Aap Bas (2=1, ye f=1, 2,..+,%). (66) 


§ 5. PARTITIONED MATRICES. GENERALIZED ALGORITHM OF Gauss 43 


When a partitioned matriz is multiplied on the right by a quasi-diagonal 
matrix, then all the columns of the partitioned matrix are multiplied on the 
right by the corresponding diagonal cells of the quasi-diagonal matriz. 


Note that the multiplication of square partitioned matrices of one and 
the same order is always feasible if the factors are split into equal quadratic 
schemes of blocks and there are square matrices on the diagonal places in 
each factor. 

The partitioned matrix (58) is called upper (lower) quast-triangular if 
s=tandall Aazg=—Ofora > B (a < B). A quasi-diagonal matrix is a special 
case of a quasi-triangular matrix. 

From the formulas (64) it is easy to see that> 

The product of two upper (lower) quasi-triangular matrices 1s ttself an 


upper (lower) quasi-triangular matriz ;° the diagonal cells of the product 
are obtained by multiplying the corresponding diagonai cells of the factors. 


For when we set s=¢ in (64) and 
A,,;=0, Bop=O for a<B, 
we find 


Cysx—O *\: for a<B 
and (a, B=1, 2, ..., 8). 
Coa= A aa Baa 


The case of lower quasi-triangular matrices is treated similarly. 


We mention a rule for the calculation of the determinant of a quasi- 
triangular matrix. This rule can be obtained from the Jiaplace expansion. 


If Aisa quasi-triangular matrix (in particular, a quast-diagonal matriz), 
then the determinant of the matrix is equal to the vroduct of the determinant 
of the dtagonal cells: 


| A | =[Aj,]|Aoe| -*- [4,,]- (67) 
2. Let a partitioned matrix 
| Ny Ne n, 
Ay, Ayg ... Ay,\ }m, 
Ag Ag ... A 


(68) 


——e . 
6 It is assumed here that the block multiplication is feasible. 


44 IJ]. THe ALGORITHM or Gauss AND SOME APPLICATIONS 


be given. To the a-th row of submatrices we add the £-th row, multiplied 
on the left by @ rectangular matrix X of dimension mg X ng. We obtain a 


partitioned matrix 


oo ¢ «© ee © © @ *©* e © @© © @# # @ 


ATA cc A RA, 


B= eoeee@ oeepee#e# ee @ “ (69) 
Ag, Ag, 
Ay A, 


We introduce ax. auxiliary square matri: V, which we give in the form 
of a square scheme of blocks: 


E... OO... OO... O\ }m 


ee ef e© «© © © #© @ @ @® @© © @ @ 


OO... EB... X ... OV}m 
- (70) 


O ... OO... H... O. ff }ms 


Oo... OO... O ... Ef }m, 

In the diagonal blocks of V there are unit matrices of order mi, ma, ..., Me, 
respectively ; all the non-diagonal blocks of V are equal to zero except the 
block X that lies at the intersection of the a-th row and #-th eclumn. 

It is easy to see that 


VA=B. (71) 
As V is non-singular, we have’ for the ranks of A and B: 
r4= Pp. (72) 
In the special case where A is a square matrix, we have from (70) : 
| Vij A] =| Bl. (73) 
But the determinant of the quasi-triangular matrix V is 1: 
|V|=1. | (74) 
Hence 
|A|=|B|. (75) 


7 See p. 12. 


§5. PartirioneD Matrices. GENERALIZED ALGORITHM OF Gauss 45 


The same conclusion holds when we add to an arbitrarv column of (68) 
another column multiplied on the right by a rectangular matrix « of suitable 
dimensions. 

The results obtained can be formulated as the following theorem. 


THEOREM 3: If to the a-th row (column) of the blocks of the partitioned 
matrix A we add the B-th row (column) multiplied on the left (right) by a 
_rectangular matrix X of the corresponding dimensions, then the rank of A 
remains unchanged under this transformation and, if A is a square matriz. 
the determinant of A 1s also unchanged. 


3. We now consider the special case in which the diagonal block A,,; m A 
- is square and non-singular (| Ai, | #0). 
To the a-th row of A we add the first row multiplied on the left by 
(a=2,....,3). We thus obtain the matrix 
Ay Ax eee A 
(1) (1) 
B= O: Age osh Ay; (76) 


e e e e e e e 6 


Aisa 
where O: i me 
AG) =— AAs Ass + Aas (eS Oi cay 8) PO, 6 oa): (77) 


If the matrix AW is square and non-singular, then the process can be 
continued. In this way we arrive at the generalized algorithm of Gauss. 
Let A be a square matrix. Then 
AW... AD 
|A |= |B |= |4| a: (78) 
AD. AD 


Formula (78) reduces the computation of the determinant | A |, consist- 
ing of st blocks to the computation of a determinant of lower order consisting 
of (s —1)-(¢—1) blocks.’ 

Let us consider a determinant 4 partitioned into four hlocks: 


A B 
4=|¢ pD|’ —) 


where A and D are square matrices. 
Suppose |A|+40. Then from the second row we subtract the first 
multiplied on the left by CA7?. We obtain 


8 if As? is a square matrix and | 4$2 | + 0, then this determinant of (s —1)-(t—1) 
blocks can again be subjected to such a transformation, etc. 


A§ Il. Tue ALGorITHM OF GAUSS AND SoME APPLICATIONS 


A B 


0 p_caip| =A! |\P- 04781. (I) 


a-| 


Similarly, if | D | 40, we subtract from the first row in A the second 
multiplied on the left by BD—, obtaining 


A—BD'C O 


a=| Cc 6D 


| =|4= D0] |). a 


In the special case in which all four matrices A, B, C, D are square (of one 
and the same order n), we deduce from (I) and (II) the formulas of Schur, 
which reduce the computation of a determinant of order 2n to the computa- 
tion of a determinant of order 7: . 


A=|AD— ACA-"B | (A 0), (Ia) 
A=|AD—BD-CD | (D~0). (IIa) 


If the matrices A and C are permutable, then it follows from (la) that 
A=|AD—CB| (provided AC = CA). (Ib) 
Similarly, if C and D are permutable, then 
A=|AD—BC| (provided CD = DC). (IIb) 


Formula (Ib) was obtained under the assumption | A | 0, and (IIb) 
under the assumption | D | 340. However, these restrictions can be removed 
by continuity arguments. 

From formulas (I)-(Ilb) we can obtain another six formulas by replac- 
ing A and D on the right-hand sides simultaneously by B and C. 


Example. 
1 0 b by 
Cy Cy dy 2 
ty & dy 


By formula (Ib), 


As d, — Cb, — cybg d, — yb, — Coby | 
dy — Cxb, — Cbg dy — Cyd, — by F 


§ 5. PARTITIONED Matrices. GENERALIZED ALGORITHM OF Gauss 47 


4. From Theorem 3 there follows also 


THEOREM 4: If a rectangular matrix R is represented in partitioned form 


A is 


where A ts a square non-singular matrix of order n (| A | 40), then the rank 
of Ris equal to nif and only if 


D=CA-'B. (81) 


Proof. Wesubtract from the second row of blocks of R the first, multiplied 
on the left by CA—'. Then we obtain the matrix 


A B 
= g2 
_ (7 ‘cua a 


By Theorem 3, the matrices R and T have the same rank. But the rank of T 
coincides with the rank of A (namely, 7) if and only if D—CA-3B=0O, 
i.e., when (80) holds. This proves the theorem. 

From Theorem 4 there follows an algorithm® for the construction of the 
inverse matrix A’ and, more generally, the product C.A~'B, where B and 
C are rectangular matrices of dimensions n X p and q X n. 

By means of Gauss’s algorithm,’® we reduce the matrix 


A B 
A 0 83 
(6) (Alon (83) 
to the form 

B 
( ‘ (84) 

O X 

We will show that 

xX =—CA-"B. (85) 


For, the same transformation that was applied to the matrix (83) reduces 
the matrix 


® See [181]. 

10 ‘We do not apply here the entire algorithm of Gauss to the matrix (83) but only 
the first n steps of the algorithm, where n is the order of the matrix. This can be done if 
the conditions (15) hold for p=n. But if these conditions do not hold, then, since 
| A | = 0, we may renumber the first x rows (or the first n columns) of the matrix (83) 
so that the x steps of Gauss’s algorithm turn out to be feasible. Such a modified Gaussian 
algorithm is sometimes applied even when the conditions (15). with p—1, are satisfied. 


48 {I. THE ALGORITHM OF GAUSS AN D SoME APPLICATIONS 


A B 
> meee (36) 


B, 
0 x-— ee (87) 


By Theorem 4, the matrix (86) is of rank n (n is the order‘of A). But 
then (87) must also be of rank n. Hence X — CA-!B=0O, ie., (85) holds 
In particular, if B = y, where y is a column matrix, and C = E, then 


to the form 


X=A-‘y. 


Therefore, when we apply Gauss’s algorithm to the matrix 


A ; 
—H O}’ 
we obtain the solution of the system of equations 


Az=y. 


Further, if in (83) we set B= C= E, then by applying the algorithm 


of Gauss to the matrix 
A 
—E oy’ 


(0 x) 


A =A. 


we obtain 


where 


Let us illustrate this method by finding A’ in the following example. 


Example. Let 


It is required to compute A—?. 
We apply a somewhat modified elimination method” to the matrix 


———— 
—— 


11 See the preceding footnote. 


§ 5. Parririonep Matrices. GENERALIZED ALGORITHM OF Gauss 49 


2 1 1 212 0 #0 
1 0 2 0 21 #90 
3 1 2 0 0 71 
° —l 0 0 06 90 90 
o—1 0 0 0 0 
0 O~—1 0 0 90 


To all the rows we add certain multiples of the second row and we arrange 
that all the elements of the first column, except the second, become zero. 
Then we add to all the rows, except the second, the third row multiplied by 
certain factors and see to it that in the second column all the elements, except 
the second and third, become zero. Then we add to the last three rows the 
first row with suitable factors and obtain a matrix of the form 


& *% * % * * 
* ry % e * * 
* * % x * * 
0 0 O-—2—1 2° 
0 0 0 4 1—8 
0 0 0 1 I1—!1 
Therefore 

—2—1 2 

At 4 1-3 


1 1 —I1 


CHAPTER III 


LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


Matrices constitute the fundamental analytic apparatus for the study of 
linear operators in an n-dimensional space. The study of these operators, 
in turn, enables us to divide all matrices into classes and to exhibit the 
significant properties that all matrices of one and the same class have in 
common. 

In the present chapter we shail expound the simpler properties of linear 
operators in an n-dimensional space. The investigation will be continued 
in Chapters VII and IX. 


S 1. Vector Spaces 


1. Let R be a set of arbitrary elements x, y, z,...in which two operations are 
defined :: the. operation of ‘addition’ and the operation of ‘multiplication 
by a number of the field r.’ We postulate that these operations can always 
be performed uniquely in R and that the following rules hold for arbitrary 
elements x, y, s of R and numbers a, £ of F: 


— 


x+y=yt. 

(e+ y)t+a=xa+(y + 2). 

There exists an element o in R such that the product of the number 0 
with any element x of R is equal to o: 


ow bo 


0-x=0a. 

1-x=. 

a (Bx) = (af) x. 
(a+ B)x—ax + Bx. 
a(x+y)=ax+ay. 


ie ee 


t These operations will be denoted by the usual signs ‘+’ and ‘.’; the latter sign will 
sometimes be omitted. 


50 


§1. Vector SPAcEs 51 


DEFINITION 1: A set R of elements in which two operations—‘ addition 
of elements and ‘multiplication of elements of R by a number of F’—can 
always be performed uniquely and for which postulates 1.-7. hold is called a 
vector space (over the freld F) and the elements are called vectors? 


DEFINITION 2. The vectors x, y,...,u of R, are called linearly dependent 
af there exist numbers a, B,..., din ¥F, not all zero, such that 


ax+ By +:+--+du—o. (1) 


If such a linear dependence does not hold, then the vectors x, y,..., u are 
called linearly independent. 


If the vectors x, y,..., u are linearly dependent, then one of the vectors 
ean be repesented as a linear combination, with coefficients in kr, of the 
remaining ones. For example, if a ~0 in (1), then 


cl pacing 
DEFINITION 3. The space R 1s called finite-dimensional and the number 
n ws called the dimension of the space if there exist n linearly independent 
vectors in R,while any n+ 1 vectors in R are linearly dependent. If the 
space contains linearly independent systems of an arbitrary number of 
vectors, then it 1s called infinite-dimensional. 


{n this book we shall study mainly finite-dimensional spaces. 


DEFINITION 4. A system of n linearly independent vectors e,, €2,..., €n 
of an n-dimensional space, given in a defimte order, rs called a basis of the 
space. 


2. Example 1. The set of all ordinary vectors (directed geometrical seg- 
ments) is a three-dimensional vector space. The part of this space that 
consists of the vectors parallel to some plane is a two-dimensional space, 
and all the vectors parallel to a given line form a one-dimensional vector 
space. 


Example 2. Let us call a column x = (2), %o,..., 22) of m numbers of 
F a vector (where 7 is a fixed number). We define the basic operations as 
operations on column matrices: 


2It is easy to see that all the usual properties of the operations of addition and of 
multiplication by a number follow from properties 1.-7. For example, for arbitrary 
x of R we have: 
sto [x+o =1:-2£+0-*4£=—(14+0)-2=1-2=<a]; 
xs + (— x) =e, where — x =(— 1)-2; 


ete. 


52 III. Linear OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 
(Xp Ley s+ oy Ly) + (Yis Yor © + o> Yn) == (21 + Yi» Vet Yor ~~ +5 Tn + Yn), 
O (Ly Lay oo vy Ly) = (AX, ALq, ..., Lp). 
The null vector is the column (0,0,...,0). It is easy to verify that all the 
postulates 1.-7. are satisfied. The vectors form an n-dimensional space. 
As a basis of the space we can take, for example, the column-of unit matrices 
of order n: 
C10: oc OO Fe 0 we OO aed) 


The space thus defined is often called the n-dimensional number space. 


Example 3. The set of all infinite sequences (2;, Z2,..., Zn,-.-) in which 
the operations are defined in a natural way, i.e., 


(By, Lays - 0s Bpy se) + (Yas Yar o> Yas» © +) = (Lat Yrs Lat Yor 00s Bat Yas se s)s 
OH (Xa, Loy .- +> Bay .--) = (AL, AL, .. 0, AMyy.), 


is an infinite-dimensional space. 


Example 4. The set of polynomials a, + at +...+a,_,t"—? of degree 
<n with coefficients in F is an n-dimensional vector space. As a basis of 
this space we can take, say, the system of powers ?°, f?,..., #"—}. 

The set of all such polynomials (without a bound on the degree) form an 
infinite-dimensional space. 


Example 5. The set of all functions defined on a closed interval {a, b] 
form an infinite-dimensional space. 


3. Let the vectors e,, é2,...,e, forms a basis of an n-dimensional vector 
space R and let x be an arbitrary vector of the space. Then the vectors 
%, 1, €2,...,@, are linearly dependent (because there are n + 1 of them): 


ApX + 4,0, + 4,0, +°+++4,€, —0, 
where at least one of the numbers @, @,,..., a, is different from zero. But 


in this case we must have a,> 0, since the vectors e;, €2,..., @, cannot be 
linearly dependent. Therefore 


% = 2,0, + %eg+-+:+ 2,4, (2) 


where %= — a/ay (#= 1, 2,...,). 

Note that the numbers 2, %2,..., 2, are uniquely determined when the 
vector x and the basis e), e2,...,@, are given. For if there is another decom- 
position of x besides (2), 


== 7e, + ween + +++ + Tan» (3) 


3 The basic operations are taken to be ordinary addition of polynomials and multipli- 
cation of a polynomial by a number. 


§1. Vector Spaces 53 
then, by subtracting (2) from (3), we obtain 
(2, — 21) €1 + (22 — 22) Cg + +0+ + (x, — 2,)€,=0, 
and since the vectors of a basis are linearly dependent, it follows that 


24 — 21 = He — To + = a, — 2, = 0,7 


1.e., 
, 4 
71 —71;, xo =2e, ee © 9 Ln — Zn (4) 
The numbers 2), x2, ..., Zp are called the coordinates of x in the basis 
@1, Co, -- - » On. 
If 
n " 
x =)'ne, and y= > HA: 
twl tml 
then 


i) n 
a+y=D (y+ ye, and «x= Saze,. 
tel t=] 
1.e., the coordinates of a sum of vectors are obtained by addition of the 
corresponding coordinates of the summands and the product of a vector by 
a number a is obtained by multiplying all the coordinates of the vector by a. 
4. Let the vectors i. 
%, = Due, 


teal 
be linearly dependent, i.e., 
m 
ad ad 5 
x (5) 
where at least one of the numbers ¢;, C2,..., Cm iS not equal to zero. 


If a vector is the null vector, then all its components are zero. Hence 
the vector equation (5) is equivalent to the following system of scalar 
equations: 

CX yy H Cghyg + 20+ + Cn Lym_ =O 
Cy Zax tb Co%eq + °° + Cyoq, = 90 


so ee oe @e¢ oe 6 e e ° e 


(6) 
Cy Ung F Co yg toe + Cp San = 0. 


As is well known, this system of homogeneous linear equations for 
C1, Co,..., Cm has & non-zero solution if and only if the rank of the coefficient 
matrix is less than the number of unknowns, i.e., less than m. A necessary 
and sufficient condition for the independence of the vectors x), X2,...,%m 
is, therefore, that this rank should be m. 


54 IIT. LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


Thus, the following theorem holds: 


THEOREM 1: In order that the vectors x, x2,...,%m be linearly inde- 
pendent it 1s necessary and sufficient that the rank r of the matriz formed 
from the coordinates of these vectors in an arbitrary basis 


yy Tyg +++ Lap || 

Lo, Tog --- Lan 

Ae ante, Ga (7) 
Tay Eng >>> Cam 


be equal to m, 1.e., to the number of vectors. 


Note. The linear independence of the vectors x;, X2,..., Xm means that, 
the columns of the matrix (7) are linearly independent, since the k-th column 
consists of the coordinates of z, (k =1,2,...,m). By the theorem, there- 
fore, if the columns of a matrix are linearly independent, then the rank of 
the matrix is equal to the number of columns. Hence it follows that in an 
arbitrary rectangular matrix the maximal number of linearly independent 
columns is equal tq the rank of the matrix. Moreover, if we transpose the 
matrix, 1.e., change the rows into columns and the columns into rows, then 
the rank obviously remains unchanged. Hence in a rectangular matriz the 
number of lincarly independent columns ts always equal to the number of 
linrarly independent rows and equal to the rank of the matrix.‘ 


>. If in an n-dimensional space a basis e;, €2,..., @, has been chosen, then 
to every vector x there corresponds uniquely the column x = (x1, L2,..., Zn), 
where 2, Z2,..., Zn are the coordinates of x in the given basis. Thus, the 
choosing of a basis establishes a one-to-one correspondence between the vec- 
tors of an arbitrary n-dimensional vector space K and the vectors of the 
n-dimensional number space R’ considered in Example 2. Here the sum 
of vectors in R corresponds to the sum of the corresponding vectors of R’. 
The analogous correspondence holds for the product of a vector by a number 
aofr. In other words, an arbitrary n-dimensional vector space is isomorphic 
to the n-dimensional number space, and therefore all vector spaces of the 
same. number n of dimensions over the same number field ¥ are 1somorphic. 
This means that to within isomorphism there exists only one n-dimensional 
vector space for a given number field. 


4 This proposition follows from Theorem 1, in the proof of which we have started’ 
from the well-known property of a system of linear homogeneous equations: a non-zero 
solution exists only when the rank of the coefficient matrix is less than the number of 
unknowns. For a proof of Theorem 1 independent of this property, see § 5. 


§ 2. A Linear OPERATOR 55 


The reader may ask why we have introduced an ‘abstract’ n-dimensional 
space if it coincides to within isomorphism with the n-dimensional number 
space. Indeed, we could have defined a vector as a system of » numbers 
given in a definite order and could have introduced the operations on these 
vectors in the very way it was done in Example 2. But we would then have 
mixed up properties of vectors that do not depend on the choice of a 
basis with properties of a particular basis. For example, the fact that all 
'the coordinates ofa vector are zero is &@ property uf the vector itself ; it does 
not depend on the choice of basis. But the equality of all its coordinates is 
not a property of the vector itself, because it disappears under a change of 
basis. The axiomatic definition of a vector space immediately singles out 
‘the properties of vectors that do not depend on the choice of a basis. 


§ 2. A Linear Operator Mapping an n-Dimensional Space 
into an m-Dimensional Space 


1. We consider a linear transformation 
Yy = yyy + AyeTg + ++ + Oy ,%q 
Yo = Ag 1 Aoo%q + sal + donk, (8) 
Yin = UmyXy + Ume%et +++ TF Aan%n » 

whose coefficients belong to the number field F as well as two vector spaces 


over F: an n-dimensional! space R and an n-dimensional space S$. We choose 
a basis e:, @o,..., &, in R and a basis gi, g2,.-.,8m in S. Then the trans- 


n . 
formation (8) associates with every vector x= > z,e, of R a certain vector 
™ tx 1 
¥ => 8, of S, i.e., the transformation (8) determines a certain operator 
kel 


that sets up a correspondence between the vector x and the vector 
y .y = Ax. It is easy to see that this operator A has the property of linear- 
ity, which we formulate as follows: 


DEFINITION 5: An operator A mapping R into S, 1.¢., associating with 
every vector x of Ra certain vector y = Ax of S 1s called lincar af for arbt- 
trary x, x, of Randa ofr 


A(%,+ %_)= Ax, + Ax,, A(ax,)=aAzx,. (9) 


Thus, the transformation (8), for a given basis in R and a given basis 
in §, determines a linear operator mapping R into S. 


56 III. Linear OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


We shall now show the converse, i.e., that for an arbitrary linear operator 
A mapping R into S and arbitrary bases e), €2,..., €, in R and gi, g2,....8m 
in S, there exists a rectangular matrix with elements in F 
| By, Ayn... Ay || 
Qo, Gon... 
22 22 2n (10) 
|@nt UIne--- Ime 
such that the linear transformation (38) formed hy means of this matrix 
expresses the coordinates of the transfurmed vectur y = Ax in terms of the 
coordinates of the original vector x. 
Let us, in fact, apply the operator A to the basis vector e, and let the 
coordinates in the basis g1, Bo,.--, 8, of the vector Ae, thus obtained be 
denoted by dix, Gox,.--, me (K =1,2,...,0”): 


Ae,= Sayg,  (k=1,2,...,2). (11) 
tm] 


Multiplying both sides of (11) by x, and summing from 1 to n, we obtain 


n m Rn 
D> %Ae,= >) (SY ag2,) 8:3 
k=l : toxl kel 
hence . : , 
y=Ax=A (& x,€;) ae x,Ae, = YB: : 
where 


n 
Y, = DD) ayr, (*=1,2,...,; m), 
kal 
and this is what we had to show. 
Thus, for given bases of R and S: to every linear operator A mapping R 
into S there corresponds a rectangular matrix of dimension m X n and, con- 


versely, to every such matric there corresponds a linear operator mapping 
R into S. 


Here, in the matrix A corresponding to the operator A, the k-th column 
consists of the coordinates of the vector Ae, (k =1,2,...,n). 


We denote by z= (x, 22,..-,2%n) and y = (Ys, Yo.-.., Ym) the coordi- 
nate columns of the vectors x and y. Then the vector equation 


y — Ax 
corresponds to the matrix equation 


y = Ag, 


§ 3. ADDITION AND MULTIPLICATION OF LINEAR OPERATORS 57 


which is the matrix form of the transformation (8). 


Example. We consider the set of all polynomials in ¢ of degree S n — ] 
with coefficients in r. This set forms an n-dimensional vector space R,, 
(see Example 4., p.52). Similarly, the polynomials in ¢ of degree X n — 2 


with coefficients in r form a space R,_;. The differentiation operator ‘ 


associates with every polynomial of R, a certain polynomial in R,_,. Thus, 
this operator maps R, into R,_,. The differentiation operator is linear, 
since 


d 
a (P+ v= a + se =; Lag(t)] =a = 


In R, and R,,_, we choose bases consisting of powers of t: 
oe Ue Oe aaa and CS at it. 
Using formulas (11), we construct the rectangular matrix of dimension 


(n—1 X n) corresponding to the differentiation operator f in these bases : 


01 0... 0 | 
0 0 2... 0 | 


ee e¢ e e& e © © &@© &© ©@ 


§ 3. Addition and Multiplication of Linear Operators 


1. Let A and B be two linear operators mapping R into S and let the cor- 
responding matrices be 


=llag||, B=l[byl|  (6=1,2,...,m; k=1,2,...,2). 


DEFINITION 6: The sum of the operators A and B is the operator C 
defined by the equation® 


Cx=Ax+ Bx (xe R). (12) 


On the basis of this definition it is easy to verify that thesum C= A+B 
of the linear operators A and B is itself a linear operator. Furthermore, 


Ce, = Ae, + Be,=2' (Az + Oy) €- 


5 ¢ € R means that the element x belongs to the set R. It is assumed that (12) holds 
for arbitrary x in R. 


a3 III. Linear OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


Hence it follows that the operator C corr2sponds to the matrix C = | Cik |. 
where cy, =a,+ by ((=1,2,...,m;k=1,2,...,n), ie, the operator C 
corresponds to the matrix 

CH=A+B. (13) 


We would come to the same conclusion starting from the matrix equation 
Cz=—Anx+ Bz (14) 


(x is the coordinate column of the vector x) corresponding to the vector 
equation (12). Since x is an arbitrary column, ,13) follows from (14). 
2. Let R, S, and T be three vector spaces of Cimension q, n, and m, and let 
A and B be two linear operators, of which B maps R into S and A maps S 
into T; in symbols: ae 
R-S-T. 

DEFINITION 7. The product of the operators A and B ts the operator C 

for which 
Cx—=A(Bx) (x¢«R). (15) 


holds for every x of R. 
The operator C maps R into T: 


R ST“ 7 


From the linearity of the operators A and B follows the linearity of C. 
We choose arbitrary bases in R, S, and T and denote by A, B, and C the 
matrices corresponding, in this choice of basis, to the operators A, B, and C. 
Then the vector equations 


3—Ay, y= Bx, = —Cx (16) 
correspond to the matrix equations : 
z—Ay, y=Bz, z=Cxz, 
where x, y, z are the coordinate columns of the vectors x, y, 3. Hence 
Czx=A (Bu)=(AB) 2 
and as the column =~ is arbitrary 


- O=AB. (17) 


Thus, the product C = AB of the operators A and B corresponds to the 
matrix C= | Cy i (¢=1,2,...,m; j7=1,2,...,q), which is the product 
of the matrices A and B. 


§ 4. TRANSFORMATION OF COORDINATES 59 
We leave it to the reader to show that the operator® 
C=aA (aeF) 


corresponds to the matrix 


w 


C=—aA. 


Thus we see that tn Chapter I the operations on matrices were so defined 
that the sum A + B, the product AB, and the product aA correspond to the 
matrices A + B, AB, and aA, respectively, where A and B are the matrices 
corresponding to the operators A and B, and a is a number of F. 


§ 4. Transformation of Coordinates 


]. In an n-dimensional vector space we consider two bases: ee ea 
(the ‘old’ basis) and ef e3,...,e, the ‘new’ basis). 

The mutual disposition of the basis vectors is determined if the coordi- 
nates of the vectors of the basis are given relative to the other basis. 


We set 


a 


Oy = by Oy + bg Cg tees tty ey 
C5 = hy Cy + boa Cg tees t bis Cn 


ore se ee eee (18) 
On big Cy + bon Cp Hoes + hin ey 
or in abbreviated form, 
ep=Dte, (k=1, 2,..., 2). (18’) 
ta} 


We shall now establish the connection between the coordinates of one 
and the same vector in the two different bases. 
Let 21, %,.-.»% and wy 7 ..., 2% be the coordinates of the vector 
« relative to the ‘old’ and the ‘new’ bases, respectively : 
n wt 
= 2) x, e, = 2) x; e; (19) 
oe kent 


In (19) we substitute for the vectors e; the expressions given for them in 
(18). We obtain: 


6 J.e., the operator for which Cx == @Az (xe R). 


60 III. Linear OPERATORS IN AN n-DIMENSIGNAL VECTOR SPACE 


x= >" Cp Se = > (St a ty) e€;- 


k=l ¢{=_1 

Comparing this with (19) and bearing in mind that the coordinates of a 
vaetor are uniquely determined when the vector and the basis are given, 
we find: 

a= Dtyxf (§=1,2,..., 2), (20) 

i= 
or in explicit form: ; 
By by BI + byexg>t -+* + hip®y 
mt + bgoXo Ss + ty, 2, (21) 


Serre ote tees +tiat 


Formulas (21) determine the transformation of the coordinates of a 
vector on transit:on from one basis to another. They express the ‘old’ 
coordinates in terms of the ‘new’ ones. The matrix 


T =|| te lh (22 


is called the matrix of the coordinate transformation or the transforming 
matrix. Its k-th column consists of the ‘old’ coordinates of the k-th ‘new’ 
basis vector. This follows from formulas (18) or immediately from (21) if 
we set in the latter z7= 1, 7 =0 fori k. 

Note that the matrix T is non-singular, i.e., 


|T | #0. (28) 


For when we set in (21) 4, =z. =... = 24, = 0, we obtain a system of n 
linear homogencous equations in the 2 unknowns a7, 2, -.., 2 with deter- 
minant | 7'|. This system can only have the zero solution 2j = 0, 23 =0., 

, x* = 0, since otherwise (19) would imply a linear dependence among the 
vectorsejf e3,...,e,. Therefore | T | 40.’ 

We now introduce the column matrices xr = (2), 2e,...,0%n) and x* = 
(xf, %y, +--+, X). Then the formulas (21) for the coordinate transformation 
can be written in the form of the following matrix equation . 


2=Tx*. (24° 

Multiplying both sides of this equation by T~—', we obtain the expression 
for the inverse transformation 

r= Tz. (25 


5 et ee ee 
7 The inequality (23) also follows from Theorem 1 (p.64), hecause the elements of 7 
are the ‘old’ coordinates of the linearly independent vectors el > e3 see es One 


§ 5. EquivaLenr Matrices. RANK or OPERATOR. SYLVESTER’S INEQUALITY 61 


§ 5. Equivalent Matrices. The Rank of an Operator. 
Sylvester’s Inequality. 


1. Let R and S be two vector spaces of dimension n and m, respectively, 
over the number field F and let A be a linear operator mapping R into 8. 
In the present section we suall make clear how the matrix 4 corresponding 
to the given linear operator A changes when the bascs in R and S are changed. 

We chouse arbitrary buses e,,@,,...,€,in R and gy, 82,...,8m in S. 


In these bases the operator A corresponds to a matrix A = || axe || (2 = 1, 2, 
3. ...,m;k=1,2,...,n). To the vector equation, 
y= Ax (26) 


there corresponds the matrix equation 
y—Az, .. (27) 


where x and y are the coordinate columns for the vectors x and y in the 
bases @,, @5,...,&, 20d gy, Bo, ---»8m:- 


We now choose other basese* e3..., e% and gi,gy,...,g7, in R and 
S. In the new bases we shall have x*, y*, A* instead of r, y, A. Here 
y* =A*z*, (28) 


Let us denote by @ and N the non-singular square matrices of order » and m, 
respectively, that realize the coordinate tranformations in the spaces R and 
S on transition from the old bases to the new ones (see § 4) : 


2=Qzx*, y=Ny*. (29) 
Then we obtain from (27) and (29): 
y* —N y= N Az =N'AQz*. (30) 


Setting P= N-1, we find from (28) and (30): 
A*= PAQ. (31) 


DEFINITION 8: Two rectangular matrices A and B of the same dimen- 
ston are called equivalent af there exist two non-singular matrices P and Q 


such that® 
B=PAQ. (32) 


8 If the matrices 4 and B are of dimension m X n, then in (32) the square matrix P 
is of order m, and @Q of ordern. If the elements of the equivalent matrices A and B belong 
to some number field, then P and Q may be chosen such that their elements belong to the 


62 TIi. Linear OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


From (31) it follows that two matrices corresponding to one and the 
same linear operator A for different choices of bases in R and S are always 
equivalent. It is easy to see that, conversely, if a matrix A corresponds to 
the operator A for certain bases in R and S, and if a matrix B is equivalent 
to A, then it corresponds to the same linear operator for certain other bases 
in R and S. 

Thus, to every linear operator mapping R into S there corresponds a 
class of equivalent matrices with elements in F 


2. The following theorem establishes a criterion for the equivalence of two 
matrices : 


THEOREM 2: Two rectangular matrices of the same dimension are equt- 
valent if and only if they have the same rank. 


Proof. The condition is necessary. When a rectangular matrix is multi- 
plied by an arbitrary non-singular square matrix (on the right or left), 
then its rank does not change (see Chapter I, p. 17). Therefore it follows 
from (32) that 


T,4== Tp. 


The condition is suffactent. Let A be a rectangular matrix of dimension 
<n. It determines a linear operator A mapping the space R with the 
asiS €,,@,,...,¢e, into the space S with the basis g,, g,,...,g,,. Let r 
denote the number of linearly independent vectors among the vectors 
Ae,, Ae,, ..., Ae,. Without loss of generality we may assume that the 
vectors Ae,, Ae,,..., Ae, are linearly independent? and that the remain- 
ing Ae,,,, Ae,,9,..-,4e, are expressed linearly in terms of them: 


Ae,= 3'¢,,Ae, (k=r+1,...,n). (33) 
jmil 


We define a new basis in R as follows: 


e; (s=1, 2,...,7), 
io pe (@=r+1,...,n). ) 
jul 
Then by (33), 
Aexy=o (kA=r+1,...,%). (35) 
Next, we set 
Aef=gf (j=1,2,...,7). (36) 


9 This can be achieved by a suitable numbering of the basis vectors @1, @:, ..., @n. 


§ 5. EquivaLENT MATRIcEs. RANK OF OPERATOR. SYLVESTER’S INEQUALITY 63 


The vectors gi, g3,..-, gy? are linearly independent. We supplement them 
with suitable vectors g741,8;+2,---»8m to obtain a basis gj, g3,...,g% 
of S. 

The matrix corresponding to the same operator A in the new bases 
€3,€3,-+-1€n, 81 82,---»8m has now, by (35) and (36), the form 


v 


mene eee 
[1 0...0 0... 0 
0 1...0 0...0 


(37) 


Bi 

I 

Oo © 
S . 

— 

com) 

© 


Along the main diagonal of /,, starting at the top, there are r units; all the 
remaining elements of J, are zeros. Since the roatrices A and J, correspond 
to one and the same operator A, they are equivalent. As we have proved, 
equivalent matrices have the same rank. Hence the rank of the original 
matrix A is r. 

We have shown that an arbitrary rectangular matrix of rank r is equiva- 
lent to the ‘canonical’ matrix J,, But J, is completely determined by speci- 
fying its dimensions m X nm and the number r. Therefore all rectangular 
matrices of given dimension m X n and of given rank r are equivalent to one 
and the same matrix J, and consequently to each other. This completes the 
proof of the theorem. 


3. Let A be a linear operator mapping an n-dimensional space R into an 
n-dimensional space S. The set of all vectors of the form Ax, where xe R, 
forms a vector space.’® This space will be denoted by AR; it is part of tne 
space § or, as we shall say, is a subspace of S. 

Together with the subspace AR of S we consider the set of all vectors 
xeR that satisfy the equation 


Ax =0 (38) 
These vectors also form a subspace of R, which we shall dencte by Ng 


10 The set of vectors of the form Ax (x e€R) satisfies the postulates 1. 7. of § 1, 
because the sum of two such vectors and the product of such a vector by a number are also 
vectors of this form. 


64 Ill. LingeEaR OPERATORS IN AN 97t-DIMENSIONAL VECTOR SPACE 


DEFINITION 9: Jf a linear operator A maps R into S, then the dimension 
r of the space AR 1s called the rank of A,'' and the dimension d of the space 
N, consisting of all vectors xe R that satisfy the condition (38) 1s called 
the defect, or nullity, of A. 

Among all the equivalent rectangular matrices that describe a given 
Operator A in distinct bases there occurs the canonical matrix 7, (see (37) ). 
We denote the corresponding bases of R and S by e},e3,...,e, and 
81, 83,-.., 8... Then 


Ael=gi,..., Ael=—gr, Aer, =...=Aee=o. 


From the definition of AR and Ny it follows that the vectors g}, 83,...,» 8 


form a basis of AR and that the vectors e7,,, €7,5,..., e2 form a basis of 
Na. Hence it follows that r is the rank of the operator A and that 
d=n—r. (39) 


If A is an arbitrary matrix corresponding to A, then it is equivalent to 
I, and therefore has the same rank r. Thus, the rank of an operator A cotn- 
cides with the rank of the rectangular matriz A 


|[@32 Sa +++ Fn 


A — Qo Goa eoee Don 
Oni ane ate Ann | 


determined by Ainarbitrary bases e,, @s,...,e, « Rand 8;,8,---,8m €S. 
The columns of A are formed by the coordinate vectors A,e;...., Anen. 
: : R n 
Since it follows from «= 3S 2,e, that Ax = > 2,Ae,, the rank of A, i.e., 
gan ; i=l 
the dimension of RA, is equal to the maximal number of linearly independ- 


ent vectors among Ae,, Ae,,..., Ae, . Thus: 


The rank of a matrix coincides with the number of linearly independent 
columns of the matriz. 

Since under transposition the rows of a matrix become its columns and 
the rank remains unchanged : 


—_—— 


———, 
11 The dimension of the space AR never exceeds the dimension of R, so that r =n, 


nm 
This follows from the fact that the equation x = D> ze; (where e,, e,,...,e, is a 
{=l 
n 
pasis of R) implies the equation Ax => 2,Ae;. 
f=l 


§ 5. EQurvaLENT Matrices. RANK OF OPERATOR. SYLVESTER’S INEQUALITY 65 


The number of linearly independent rows of a matrix is also equal to 
the rank of the matrix.” 


4, Let A and B be two linear operators and let C = AB be their product. 
Suppose that the operator B maps R into S and that the operator A maps S 
into T. Then the operator C maps R into T: 


R—-S47T, RT. 


_ We introduce the matrices A, B, C corresponding to A, B, €C in some 
choice of bases in R, S, and T. Then the matrix equation C = AB will cor- 
respond to the operator equation C = AB. 

We denote by r,, 73, To the ranks of the operators A, B, C or, what is the 
same, of the matrices A, B, C. These numbers determine the dimensions of 
the subspaces AS, BR, A(BR). Since BRC S, we have A(BR)c AS. 
Moreover, the dimension of 4(BR) cannot exceed the dimension of BR.'* 


Therefore 
To Sa To Sry. 


These inequalities were obtained in Chapter I, § 2 from the formula for the 
minors of a product of two matrices. 

Let us regard A as an operator mapping BR into T. Then the rank of 
this operator is equal to the dimension of the space A(BR),i.e., to rc. There- 
fore, by applying (39) we obtain ‘ 

Ax=o, (40) 


where d, is the maximal number of linearly independent vectors of BR that 
satisfy the equation 


But all the solutions of this equation that belong to S form a subspace of 
dimension d, where 
d=n—?r, (42) 


is the defect of the operator A mapping S into T. Since BRC S, 
d, <d. (43) 
From (41), (42), and (43) we find: 


Tats —NSTe. 


12 In § 1 we reached these conclusions on the basis of other arguments (see p. 54). 
13 RCS means that the set R forms part of the set S. 
14 See Footnote 11. 


66 Lil. Linear OperarorRs IN AN n-DIMENSIONAL VECTOR SPACE 


Thus we have obtained Sylvester’s inequality for the rank of the product 
of two rectangular matrices A and B of dimensions m X n and n X q: 


TatTg—NS7g3 min (ry, rz). (44) 


§ 6. Linear Operators Mapping an n-Dimensional Space into Itself 


1. A linear operator mapping the n-dimensional vector space R into itself 
(here R= S andn=mM) will be referred to simply as a linear operator in R. 
The sum of two linear operators in R and the product of such an operator 
by a number are also linear operators in R. Multiplication of two such 
operators is always feasible, and this product is also a linear operator in R. 
Hence the linear operators in R form a ring.’* This ring has an identity 
operator, namely the operator E for which 


Ex=x (xe R). (45) 
For every operator A in B we have 
EA=AE=A. 


If A is 2 linear operator in R, then the powers A?=AA, A?=AAA, 
and in general A™= AA---A havea meaning. Weset A°=E. Then it is 
—_=_ oo" 
m times 
easy to see that for all non-negative integers p and q we have 


A’ Af — Arts ; 


Let f(t)= at™ + ai™—1+4---+a,,_,t+4,, be a polynomial in a scalar 
argument ¢ with coefficients in the field Fr. Then we set: 


f(A) =a)A" + 0,A" 7 4--- 4a, ,A+0,E. (46) 


Here {(A)g(A) =9(A)j{(A) for any two polynomials f(t) and g(t). 
Let 
y= Ax (x,yeR). 


We denote by 2%, %e,..., Z, the coordinates of the vector x in an arbitrary 
basis e€),@2,...,e, atid by yi, yo,..., Yn the coordinates of y in the same 
basis. Then P 

Y; =)’ Apt, (2 =I, 25 eee9 2) ° (47) 


k=l 


—_ 


15 This ring is in fact an algebra. See Chapter I, p. 17. 


§ 6. Maprine n-DIMENSIONAL SPACE INTO ITSELF 67 


In the basis e,, €,,..., €, the linear operator A corresponds to a square 


matrix A= |' ax ik 2* We remind the reader (sce §2.) that in the k-th 
column of this matrix are to be found the coordinates of the vector Ae, 
(k=1],2,...,). Introducing the coordinate columns r= (2), f2,..., Fn) 


and y = (Yi, Ya,---, Yn), we can write the transformation (47) in matrix 
fornt , 
y=—Ax. (48 

The sum and product of two operators A and B correspond to the sum 
and product. of the corresponding square matrices A= I ix lit and 
B= | by 7. The product aA corresponds to the matrix oA. ‘the identity 
operator E corresponds to the square unit matrix E= |] 6, |%. Thus, the 
choice of a hasis establishes an isomorphism betwecn the ring of linear opera- 
tors in Rand the ring of square matrices of order.n with clements in F. In 
this isomorphism the polynomial f(A) correspouds to the matrix f(A). 

Let us consider, apart from the basis e,,e,,...,¢€,, another basis 
ejes...,esof R. Then, in analogy with (48), we have 


y* = A*z*, . (49) 
where z*, y* are the column matrices formed from the coordinates of the 
vectors x, y in the basis ef, e},..., e, and A"=|| aj, ||? is the square matrix 


corresponding t) the operator A in this basis. We rewrite in matrix form 
the formulas for the transformation of coordinates 


2=Tx*, y=Ty*. (50) 
Then from (48) and (50) we find: 
yea T MATS": 
and a comparison with (49) gives: 


A*=T AT. (51) 


Formula (51) is a special case of (31) on p. 61 (namely, P=T~* and 
Q=T). 
DEFINITION 10: Two matrices A and B connected by the relation 
Ba TOAT. (51’) 


where T is a non-singular matrix, are called similar.*" 


16 See § 2 of this chapter. In this case the spaces R and S coincide; in the same way, 
the bases @:, @2:,...,@n and @:, 82,...,@m of these spaces are identified. 


68 III. LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


Thus, we have shown that two matrices corresponding to one and the 
same linear operator in R for distinct bases are similar and the matrix T 
linking these matrices coincides with the matrix of the ccordinate trans- 
formation in the transition from the first basis to the second (see (50) ). 

In other words, to a linear operator in R there corresponds a whole class 
of similar matrices; they represent the given operator 1m various bases. 

In studying properties of a linear operator in R, we are at the same time 
studying the matrix properties that are common to the whole class of similar 
matrices, that is, that remain unchanged, or invariant, under transition from 
a given matrix to a similar one. 

We note at once that two similar matrices always have the same determi- 
nant. For it follows from (51’) that 

B=|P|-*|A||T|=| A]. (52) 

The equation | B|=| A | is a necessary, but not a sufficient condition 
for the similarity of the matrices A and B. 

In Chapter VI we shall establish a criterion for the similarity of two 
matrices, i1.e., we shall give necessary and sufficient conditions for two 
square matrices of order n to be similar. 

In accordance with (52) we may define the determinant | re of a linear 
operator A in R as the determinant of an arbitrary matrix corresponding 
to the given operator. 

If | 4 | =0 (0), then the operator A is called singular (non-singular). 
In accordance with this definition a singular (non-singular) operator cor- 
responds to a singular (non-singular) matrix in any basis. For a singular 
operator: 


1) There always exists a vector x +4 osuch that Ax =o; 
2) AR is a proper part of R. 


For a non-singular operator: 
1) Ax =o implies that x =o; 


2) AR=R, i.e., the vectors of the form Ax (xe R) fill out the whole 
space R. 


In other words, a linear operator in R is singular or non-singular depending 
on whether its defect is positive or zero. 


17 The matrix T can always be chosen such that its elements belong to the same basic 
number field F as those of 4 and B. It is easy to verify the three properties of similar 
matrices: 

Reflexivity (a matrix A is always similar to itself) ; 

Symmetry (if A is similar to B, then B is similar to A); and 

Transitivity (if A is similar to B, and B to C, then A is similar to C). 


§ 7. CHARACTERISTIC VALUES AND CiIARACTERISTIC VECTORS 69 


§ 7. Characteristic Values and Characteristic Vectors 
of a Linear Operator 


1. An important role in the study of the structure of a linear operator A 
in R is played by. the vectors x for which 


° Ax=ix (Aer, x0) (53) 


Such vectors are called characteristic vectors and the numbers A corres- 
ponding to them are called characteristic values or characteristic roots of 
the operator A (or of the matrix A).f 

In order to find the characteristic values and characteristic vectors of an 


operator A we choose an arbitrary basis e,,e,,...,e, inR. Let x => x,e, 
and let A= 1 ik ie be the matrix corresponding to A in tis basis 
€,,€5,...,@,- Then if we equate the corresponding coordinates of the vee- 
tors on the left-hand and right-hand sides of (53), we obtain a system of 
scalar equations 

Oy 4%1 + Myo%y + +++ + 44,0, = Aa, 

Boy% + Ago%e tots Fb hy = Att (54) 


e a e e a ee es 8@® @# e&® @ 8 ® e ° e a e 


Ani + Ane%e tee: + Ayn, X, = AX, ’ 


which can also be written as 
(a4; — A) wy + Qyo% + +++ + 44,2, = 0 
Ag ,% + (Aq — A) Lg + o+* + Ay,%, = 0 | (55) 


Anjyry oP AnoXe ae eee (Gan A) Ly — 0 


Since the required vector must not be the null vector, at least one of its 
coordinates 21, £2,..., 2, must be different from zero. 

In order that the system of linear homogeneous equations (55) should 
have a non-zero solution it is necessary and sufficient that the determinant 
of the system be zero: 

Qy,—4 Me +++ Ay 
a Aon —Ah... @ 
21 22 2n —(. (56) 


oe @ © e e © © @# 8» # @ © e# © &@ @ @ 


+ Other terms in use for the former are: proper vector, latent vector, eigenvector. 
Other terms for the latter are: proper value, latent value, latent root,latent number, 
characteristic number, eigenvalue, etc. 


70 III. Linear OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 


The equation (56) is an algebraic equation of degree n in A. Its coeffi- 
cients belong to the same number field F as the elements of the matrix 
A= || au. 

Equation (56) occurs in various problems of geometry, mechanics, 
astronomy, and physics and is known as the characteristic equation or the 
secular equation’® of the matrix A= 1 Aix 7 (the left-hand side is called 
the characteristic polynomial). 

Thus, every characteristic value 4 of a linear operator A is a root of the 
characteristic equation (56). And conversely, ‘f a number 4 is a root of 
(56), then for this value 2 the system (55) and hence (54) has a non-zero 
solution 2%, 22,..., 2n,1¢., to this number 4 there corresponds a characteristic 
vector #-== S2z,e, of the operator A. 

From what we have shown, it follows that every hnear operator A in R 
has not more than # distinct characteristic values. 

If F is the field of complex numbers, then every linear operator in R 
always has at least one characteristic vector in R corresponding to a charac- 
teristic value 2..° This follows from the fundamental theorem of algebra, 
according to which an algebraic equation (06) in the field of complex 
numbers always has at least one root. 

Let us write (56) in explicit form 


|A — AB | ==(— 4)" + S,(—4)"14- 8, (— a2 +--+ 8,4 (— A +8,=0. (57) 


It is easy to see that here 


n tk 
81 = 2%, S,= Ps (; i) oe (58) 


lst<ksn 
and, in general, S, is the sum of the principal minors of order p of the matrix 
A=|| au ||? (p=1,2,...,2).7° In particular, 8, =| A |. 


We denoie by A the matrix corresponding to the same operator A in 
another basis. A Is similar to A: 


12 The name is due to the fact that this equation oceurs in the study of secular per- 
turbations of thie planets. 

19 This proposition is valid even in the more general ease in which F is an arbitrary 
algebraically closed field, i.e., a field that contains the roots of all algebraic equations 
with coefficients ‘in the field. 


20 The power (—A)"-? occurs only in those terms of the characteristic determinant (56) 
that contain precisely »—p of the diagonal elements, say, 


Dis — A; Qj3,—A, oo ey Fn _oin—p 


The product of these diagonal elements occurs in the expansion of the determinant (56) 


§ 7. CHARACTERISTIC VALUES AND CHARACTERISTIC VECTORS 77 

A= IAT. 
Hence ss A 
A—AB=T (A—AE)T 
and therefore 
; |\A—AE|=|A—AE|. (59) 
Thus, similar matrices A and A have the same characteristic polynomial. 


This polynomial is sometimes called the characteristic polynomial of the 
operator A and is denoted by |.A — 4E | 


If x,y, %,... are linearly independent characteristic vectors of an 
operator A corresponding to one and the same characteristic A, and a, B, y,. 
are arbitrary numbers of F, then the vector ax + By + yz +--+ ts either equal 
to zero or is also a characteristic vector of A corresponding to the same A. 

For from 

Ax =ix, Ay =ijy, Az=A4z,... 
it follows that 


Al(ax+ By tyster)H=Al(ax+ By tyate>s). 


In other words, linearly independent characteristic vectors corresponding 
to one and the same characteristic value 4 form a basis of a ‘characteristic’ 
subspace each vector of which is a characteristic vector for the same A. In 
particular, each characteristic vector generates a one-dimensional subspace, 
a ‘characteristic’ direction. 

However, if characteristic vectors of a linear operator A correspond to 
distinct characteristic values, then a linear combination of these character- 
istie vectors is not, in general, a characteristic vector of A. 

The significance of the characteristic vectors and characteristic numbers 
for the study of linear operators will be illustrated in the next section by the 
example of operators of simple structure. 


with a factor in which the term free of 4 is the principal minor 


where i, i2,..., tp together with ji, j2,..., Jn-»p forms a complete set of indices 1, 2,...,; 
hence in the development of (56) we have 


, ‘ t t eee ¢ 
ares read seve? Nas 3, — A) -*° (@; == a r : 
| A — AB | = (Ajj, — A) (iain ) (Fin _Sn—p a é tg sss ty -o) 
When we take all possible combinations j;, j2,..., jn—p of n — ~ of the indices 1, 2,...,n, 


we obtain for the coefficient Sp of (—A)”-? the sum of all principal minors of order p in 4. 


72 TI]. Linfark OPERATORS IN AN ?2-DIMENSIONAL VECTOR SPACE 


§ 8. Linear Operators of Simple Structure 


1. We begin with the following lemma. 


Lemma: Characteristic vectors belonging to pairwise distinct charac- 
teristic values are always linearly independent. 


Proof. Let 
Ax, =1,%, (4,40; 4,4A, for t Ak; i1,k=1,2,...,m) (60) 
and 


ea: (61) 


ja} 


Applying the operator A to both sides we obtain: 


ay 6A, %,= oO. (62) 
tes] 
We multiply both sides of (61\ hy A; and subtract (61) from (62) term by 
term. Then we obtain 


> ¢,(A,—4,)%,=0. (63) 
t=2 


We can say that (63) is obtained from (61) by termwise application 
of the operator 4d—A,E. If we apply the operators A—JA,E,...,A—d,,_,E 
to (63) term by term, we are led to the following equation : 


Cm (Am — Am—1) (Am— &m—2) °° * (Am Ay) X, = 0, 


so that cm =0. Since any of the summands in (61) can be put last. we have 
in (61) 
C; = Cg=. oe =c,==0, 


1e., there is no linear dependence among the Vectors %,, %,...,%,- This 
proves the lemma. 

If the characteristic equation of an operator has 7 distinet roots and these 
roots belong to Fr, then by the lemma the characteristic vectors belonging to 
these roots are linearly independent. 


DEFINITION 11: A linear operator A in R is said to be an operator of 
simple structure if A has n linearly independent characteristic vectors in R. 
where n is the dimension of R. 

Thus, a linear operator in R has simpie structure if all the roots of the 
characteristic equation are distinct and belong to F. However, these condi- 


§ 8. Limvear OPERATORS OF SIMPLE STRUCTURE 73 


tions are not necessary. There exist linear operators of simple structure 
whose characteristic polynomial has multiple roots. 

Let us consider an arbitrary linear operator A of simple structure. We 
denote by g,, 2.,.-->8, a basis of R consisting of characteristic vectors of 
the operator, 1.e., 


Ag,=i.g, (k=1,2,...,7). 
If 


x — X,8x: 
kezl 
then 
Ax =e x,Ag, = 2 he Be- 


eo n 
The effect of the operator A of simple structure on the vector x = DB 2,8; 
k=1 


may be put inte words as follows: 


In the n-dimensional space R thcre exist n linearly independent ‘direc- 
tions’ along which the operator A of simple structure realizes a ‘dilatation’ 
with cocfficients dy, do, ..., An. An arbitrary vector x may be decomposed 
into components along these charactcristic directions. These components 
are subject to the corresponding ‘dilatations’ and their sum then gives the 
vector Ax.. 


It is easy to see that to the operator A in a ‘characteristic’ basis 
£1, 2, --->8, there corresponds the diagonal matrix 


A= | ASu ite 


If we denote by A the matrix corresponding to A in an arbitrary basis 
€,,€,...,e,, then 


A=TAbg(|2 77. (64) 


A matrix that is similar (p. 68) to a diagonal matrix is called a matrix of 
simple structure. Thus, to an operator of simple structure there corresponds 
in any basis a matrix of simple structure, and vice versa. 


2. The matrix T in (64) realizes the transition from the basis e,,e,,...,e, 
to the basis g,, g,,...,8,- The k-th column of 7' contains the coordinates of 
a characteristic vector g;, (with respect to e,, e,,...,e,) that corresponds 
to the characteristic value A, of A (K=1, 2,...,2”). The matrix T is called 
the fundamental matrix for A. 


14 Iff. Linekar OprrATORS IN AN n-DIMENSIONAL VECTOR SPACE 
We rewrite (64) as follows: 
A=TLT” (L={Ad,,dq,...,4,))- (64’) 


On going over to the p-th compound mat lpn), we obtain 
(see Chapter I, § 4): 


WU =T2 7". (65) 


2, is a diagonal matrix of order N (w =(*)) along whose main diagonal 


are all the possible products of d,, A». ...,d, taken p at atime. A comparison 
of (65) with (64’) yields the following theorem : 
THEOREM 3: Jf a matrix A == | Aix It has simple structure, then for 


every p <n the compound matriz A, also has simple structure ; moreover, 
the characteristic values of UX, arc all the possible products Ai Aig’ Ai, 
(li, cig... << t=) of p of the characteristre values A, do, .. An 
of A, and the fundamental matrir of Mp is the compound I, of the funda. 
mental matrix T of A. 

Corotuary: If a characteristic value A, of a matrix of simple structure 
A= | Qik |i corresponds to a characteristic vector .ynth the coordinates 
tir, tox, ..., tnx (A= 1, 2,...,2) andif T= } tix IK _ then the characteristic 
value An Avg?" A,, (lk, <hg<...< k,n) of MN, corresponds to the 
characteristic vector with coordinates 


$, tn «2. t . ; 
To ?) (lsty<ig<--+ <i, sn). (66) 
ky ke eee k, 
An arbitrarv matrix A = | Qik || mav be represented in the form of a 


sequence of matrices A, (m— o) each of which does not have multiple 
sharacteristic values and, therefore, has simple structure. The characteristic 


values 4%), 4%”, ..., &” of the matrix A,, converge for m—> « to the char- 
acteristic values A, d2,..., An of A, 


lim AM = (k=1,2,..., 2). 


Hence 
lim a” Mm oa = Anak, ++ Any (lLSky < hg <ees<k, Sn). 


m—» OO 


Moreover, since hm W,),-U,, we deduce from Theorem 3: 


™% -* CO 


§ 8. LInEaR OPERATORS OF SIMPLE STRUCTURE 75 


Tr1c0REM 4 (Kronecker): Jf dy, A... .., 4, 18 a complete system of char- 
acteristic valucs of anarbitrary matrir A, then a complete system of charac- 
teristic valucs of the compound matror M, consists of all —ossible products of 
the numbers A;, A2,.--,4n taken p at atime (p=1, 2,...,n). 

In the present section we have investigated operators and matrices of 
simple structure. The study of the structure of operators and matrices of 
general type will be resumed.in Chapters VI and VII. 


CHAPTER IV 


THE CHARACTERISTIC POLYNOMIAL AND THE 
MINIMAL POLYNOMIAL OF A MATRIX 


Two polynomials are associated with every square matrix: the characteristic 
polynomial and the minimal polynomial. These polynomials play an impor- 
tant role in various problems of the theory of matrices. For example, the 
eoncept of a function of a matrix, which we shall introduce in the next 
chapter, will be based entirely on the concept of the minimal polynomial. 
In the present chapter, the properties of the characteristic polynomial and 
the minimal polynomial are studied. A prerequisite to this investigation 
is some basic information about polynomials with matrix coefficients and 
operations on them: 


§ 1. Addition and Multiplication of Matrix Polynomials 


J. We consider a square polynomial matrix A(A), i.e., a square matrix 
whose elements are polynomials in 4 (with coefficients in the given number 
field F): 
0 1)4m-1 ( 
A (a) = || ay, (4) || = [lage a” + aya’ + ee ag” ||P. (1) 


The matrix A(A) ean be represented in the form of a polynomial with 
matrix coefficients arranged with respect to the powers of A: 


A (A) = Agh™ + Ayam" + ++ + Ay, (2) 
where 


=llez lt G=0, 1,-..,m). (3) 


The number m is called the degree of the polynomial, provided Ay € O. 
The number 7 is called the order of the polynomial. The polynomial (1) 
is called regular if | Ay | ~0. 

A polynomial with matrix coefficients will sometimes be called a matriz 
polynomial. In contrast to a matrix polynomial an ordinary polynomial 
with scalar coefficients will be called a scalar polynomial. 


76 


§2. Ruut anp Lert Division or Matrtx POLYNOMIALS 77 


We shall now consider the fundamental operations on matrix poly- 
nomials. Let two matrix polynomials A(A) and B(A) of the same order he 
given. We denote by m the larger of their degrees. These polynomials can 
be written in the form 


A (A) =Agd™ + A,A™-14-+-44,, 
B (A) = Bod™ + By AP-1 4 ---4B,. 
Then 


A (a) + B(A) =(Ag + Bo) A" + (Ay + By Amt bs + (A, +B), 


ie.: The sum (difference) of two matrix polynomials of the same order can 
be represented in the form of a polynomial whose degree does not exceed 
the larger of the degrees of the given polynomials. 

Let A(A) and B(A) be two matrix polynomials of the same order n and 
of respective degrees m and p: 


A(A)= A,A™ + AA™-14 +--+ An (A,¥0O), 
B(A) = Bod? + B,A?-) 4 ---+ By (By XO). 
Then 
A (a) B(A) = Ag ByA™ +? + (Ag By + ABy) A™tP-1 4+ + ADB. (4) 


If we multiply B(s) by A(A) (i.e., interchange the order of the factors), 
then we obtain, in general, a different polynomial. 


2. The multiplication of matrix polynomials has a specific property. In 
contrast to the product of scalar polynomials, the product (4) of matrix 
polynomials may have a degree less than m+ p, 1.e., less than the sum of 
the degrees of the factors. For, in (4) the product A,B, may be the null 
matrix even though A, ~ 0, B, ~ O. However, if at least one of the matrices 
A, and B, is non-singular, then it follows from A, 54 0 and B, 7+ O that 
A,B, ~O. Thus: The product of twa matrir polynomials 1s a polynomial 
whose deqrec ts less than or equal to the sum of the degrees of the factors. 
If at least one of the two factors is regular. then the degree of the product ts 
always equal to the sum of the degrces of the factors. 


§ 2. Right and Left Division of Matrix Polynomials 


I. Let A(A) and B(A) be two matrix polynomials of the same order 7, and 
let B(A) be regular: 

A(A) = AgA™ + A,Am-l 4s + Ay (A,*0O), 

BA) = Bod? + ByAP-} 4+ +--+ By (| By|0). 


78 IV. CHARACTERISTIC AND MINIMAL POLYNOMIAL OF A MATRIX 


We shall say that the matrix polynomials Q(A) and R(A) are the right 
quotient and the right remainder, respectively, of A(A) on division by 
B(A) if 

A (4)= Q (A) B(A) + B (A) (5) 


and if the degree of R(A) is less than that of B(A). 


wN ‘ 
Similarly, we shall call the polynomials, Q(A) and R(A) the left quotient 
and the left remainder of A(A) on division by B(A) if 


A (A)= B(A)Q (a) + B (A) (6) 


and if the degree of R(A) is less than that of B(A). 

The reader should note that in the ‘right’ division (i.e., when the right 
quotient and the right remainder are to be found) in (5) the quotient Q(A) 
is multiplied by the ‘divisor’ B(A) on the right, and in the ‘left’ division in 
(6) the quotient Q(A) js multiplied by the divisor B(A) on the left. The 
polynomials Q(4) and #(/A) do not, in general, coincide with Q(A) and R(A). 
2. Weshall now show that both right and left division of matrix polynomials 
of the same order are always possible and unique, provided the divisor 1s a 
regular polynomial. 

Let us consider the right division of A(2) by B(A). Ifm < p, we ean set 
Q(A) =O, R(A) =A(A). If m = p, we apply the usual] scheme for the divi- 
sion of a polynomial by a polynomial in order to find the quotient Q(A) and 
the remainder R(A). We ‘divide’ the highest term of the dividend A.A” by 
the highest term of the divisor B,d?. We obtain the highest term 4)Bj14? 
of the required quotient. We multiply this term on the right by the divisor 
B(A) and subtract the product so obtained from A(A). Thus we find the 
‘first remainder’ A)(4): 


A(A) = Ay Boa" -? B(A) + AGA). (7) 
The degree m“) of A (A) is less than m: 
AQYA)= AD AMM 4... (AMO, mM <m). (8) 


If mS p, then we repeat the process and obtain: 


Aa) = AM) Bs! qm)-p B(A) + A(?)(A) ’ (9) 
AM A)= ADP am 40 (mM <m™), 


ete. 


§ 2, RiaHT AND Lert Division or Matrix PoLyNoMIALS 79 


Since the degrees of A(A), A(4), A@(A), ... decrease, at some stage we 
arrive at a remainder R(A) whose degree is less than p. Then it follows 
from (7) and (9) that 

A(A) =Q (4)B (A) + R(A), 
where 
Q(A) = Ag By’ Am -? + AD BS Ame 4 ee, (10) 


We shall now prove the uniqueness of the right division. Suppose we 
have simultaneously 
A(A) = Q(A) B(A) + R(A) (11) 
and 


A) = Q*(4) BUA) + B*(A), (12) 


where the degrees of R(A) and R*(A) are less than that of B(A), Le., less 
than p. Subtracting (11) from (12) term by term we obtain 


[Q (A) — Q* (A)] B(A) = B* (A) — B (A). (13) 


If we had Q(A) — Q*(A) = O, then the degree on the left-hand side of (13) 
would be the sum of the degrees of B(A) and Q(4) — Q*(A), because 
| By | 40, and would therefore be at least equal to p. This is impossible, 
since the degree of the polynomial on the right-hand side of (13) is less 
than p. Thus, Q(A) —Q*(A) =O, and then it follows from (13) that 
R(A) — R*(A) =O, 1e., 


Q(A)=Q* (A), R(A)=R*(A). 


The existence and uniqueness of the left quotient and left remainder is 
established similarly.’ 


1 Note that the possibility and uniqueness of the left division of 4(4) by B(A) follows 
from that of the right division of the transposed matrices A'(A) and ee: (The regu- 
larity of B(A) implies that of B™ (A).) For from 


AT (A) =Q, (A) BT(A) + B,(A) 
it follows (see Chapter I, p. 19) that 


A(A) = B(A) Qf (A) + BI (A). (8’) 


By the same reasoning, the left division of 4(4) by B(A) is unique; for if it were not, 
then the right division of A'(A) by B'(A) would not be unique. 
Comparison of (6) and (6’) gives 


Q(ay= QT (a), R(ay= RIT (A). 


30. -«XIV. Cwaracrerietic anp Minima PouyNomiau or a Matrix 


Ezample. ” 
7 Mea 243 +- 3 
4@=||_ eon g a 34944 | 
A, 
i 2 } 0 1! I 0 0 
=|-1 all? +j-2 0 peal i[*+[]1 oj) 
_ ae 241) 7 2 = 1 
a0 =||_ay a sel= a i+ [1 sl} 
ee A744 2984.13 
a 1 1B (a) —! 
| Bol = 1s =| 3 l 2 |: AgBe =i 5 || 4e%o BEA) ! —A?41 an a| 
AQ) (4) == =ll_ ae as | qe 4-44 ey 
—2A7+1 349+ 9 —JF+yn 3484122 
L099 42 — 134 
er eee —11A |. 
01 “3 213 0 0 
1) = 
4°a=|_9 #4 (=) Cullat of 
01 11 1 2 
(1) B31 — = 
apB'=||_9 ol fi al=l—2 —all 
bees od 2i| |) 24843 —at4+1 = 24665 
ABB (2) =| _ 5 se ae eel 2 [= ee: 226 
R (A) = AM (A) — AQ By*B (4) 
3A #184 er aerate — 134 —5 
=||_ on a4) ~all- eee —6 =| a4s 
: 2 3 5 2 3A+1 54+2 
a aaa ee eal 52 +|_2 2 |=lea 2 aisle 


As an exercise, the reader should verify that 


A(A) =Q(A)B(A) + R(A). 


§ 3. The Generalized Bézout Theorem 


1, We consider an arbitrary matrix polynomial of order n 


P(A)= FA" +P AM +e +B, (Fy). (14) 


§ 3. Tue GENERALIZED BEzout THEOREM 81 

This polynomial can also be written as follows: 
F(A) =A" Fy + APP eee +. (15) 
For a sealar A, both ways of writing give the same result. However, if 
we substitute for the scalar argument / a square matrix A of order n, then 


the results of the substitution in (14) and (15) will, in general, be distinct, 
since the powers of A need not be permutable with the matrix coefficients 


Fo, Fi,..., F'm- 
We set 
F (A) =F,A% 4+ F\A™ 14 ---4 FF (16) 
and $e 
F(A)=A™F, + A™®-1F,+-°°-4+F,,, (17) 


and call F(A) the right value and F(A) the left value of oe on substi- 
tution of A for 4.? 
We divide F(A) by the Pee AE — A. In this ease the right re- 


mainder F(A) and left remainder RA) will not depend on A. To determine 
the right remainder we use the usual division scheme: 
F(A)= FAM + Fs 4+ + ¥,, 
= FyA™—1 (AE — A) + (FoA + Fy) "1 4+ FAM? + +e 
= [FyA"—!+4(FyA+F,) a" —-?] (AB—A)+ (Fy A?24+- Fi A+Fg) A —74-Fyh™ 3 40 
= [FyA"—1 + (Fed + Fy) am? + +> 
+ FyA™-3 4 FAM 24 oo + Fy y] (AB — A) 
+ FyA"+ FAP 14+ ---+F,,. 


Thus we have found that 
R= F,A"+ FA" 14+ +--+ F,=F (A). (18) 


Similarly 


R= F(A). (19) 
This proves 


THEOREM I (The Generalized Bézout Theorem): When the matrix pely- 
nomial F(A) 1s divided on the right by the binomial AE —. A, the remainder 


is F(A); when tt is divided on the left, the remainder is F(A). 


2In the ‘right’ value F(A) the powers of A are at the right of the coefficients; in 
“NS 
the ‘left’ value F(A), at the left. 


82 IV. CHARACTERISTIC AND MINIMAL POLYNOMIAI, OF A MATRIX 


2. From this theorem it follows that: 

A polynomial F(A) 1s divisible by the binomial AE — A on the right (left) 
urthout remainder if and only if *(A) =O (P(A) = 0). 

Example. Let A= || ay, ||? and let f(A) be a polynomial in 4. Then 

F(A) =f(s)E— f(A) 

is divisible by AE — A (both on the right and on the left) without remainder. 
This follows immediately from the generalized Bézout Theorem, because in 
this case F(A) = P(A) =O. 


§ 4. Th> Characteristic Polynomial of a Matrix. The Adjoint Matrix 


1. We consider a matrix 4 = | Aix ||? . The characteristic matriz of A is 
AE —A. The determinant of the characteristic matrix 


A(d)= |AH — A|=|26y—aghf, 


is a scalar polynomial! in 4 and is called the characteristic polynomial of A 
(see Chapter ITI, § 7).° 

The matrix B(A) = || 5. (4)||?, where by (A) is the algebraic complement 
of the element 1d, — ay in the determinant 4(A) is called the adjoint matrix 
of A. 

By way of example, for the matrix 


| 
G1, Aq Ay 


A =||4q, gq gg 
G31 39 Os | 
we have: 
A—Qy —G@yg —Ay 
AH—A=|| —ag, A— dg — Ags ||, 
—y —Agg A— agg || 


A (A) =| AH — A| = a8 — (a, + gg + Ogg) A? + ..., 


A® — (gq + gg) A+ Ago%gg — Aeg%gg * * 
Ag A + Aeg%g1 — Gogg 
AgyA + GeyA3g — Ugelg, * ¥ 


B(@a)= 


3 This polynomial differs by the factor (—1)* from the polynomial 4({A) introduced 
in Chapter III, § 7. : 


§ 4, CHARACTERISTIC POLYNOMIAL oF A Matrix. ADJomnT Matrix 83 
These definitions imply the following identities in /: 


(AE — A) B(A)=A(A) EB, (20) 
B(A) (AE — A) =A(A)E. (20 ) 


The right-hand sides of these equations can be regarded as polynomials with 
matrix coefficients (each of these coefficients is the product of a scalar and 
the unit matrix EF). The polynomial matrix B(A) can also be represented 
in the form of a polynomial arranged with respect to the powers of 4. Equa- 
tions (20) and (20’) show that 4(A)E is divisible on the right and on the 
left by AH — A without remainder. By the Generalized Bézout Theorem, 
this is only possible when the remainder 4 (A) = A(A) is the null matrix. 
Thus we have proved: 


TurorEM 2 (Hamilton-Cayley): Every square matriz A satisfies ats 
characteristic equation, 1.e. 


ee A(A)=0. ‘21) 
4=[_} 3): 
aa =|"7" jo 3/=P 8447 
savnar—savveal_2 ts] 1 orf tf Sle 


2. We denote by di, do,..., A» all the characteristic values of A, 1.e., all the 
roots of the characteristic poiynomial 4(A) (each A, is repeated as orten as 
its multiplicity as a root of 4(A) requires). Then 


A(A) = |AB—A| =(A— A) (A— Aq) +++ (A—2,). (22) 


Let g() be an arbitrary scalar polynomial. We wish to find the charac- 
teristic values of g(A). For this purpose we spht g() ito linear factors 


9 (Ht) = Og (p4 — py) (1 — fg) «++ (Ue — fy). (23) 
On both sides of this identity we substitute the matrix A for u: 
g(A) = a9 (A — 4,2) (A — ph) +++ (A — 4 #). (24) 


Passing to determinants on both sides of (24) and using (22) and (23) 
we find 


84 IV. CHARACTERISTIC AND Mrn1imMaL POLYNOMIAL OF A Matrix 


|\g(A)|= 49 |4 — 4, B| |A— pp B|--- |A— 42! 
= (—1)"ap4 (m4) Atty) +++ A(H) 


S(ai a I I (uM, — D=g(alar- - 9 (A,)- 


fuel kel 
If in the equatior 


|9(A)] =9 (Aq) 9 (Ag) «-+ 9 (A,) (25) 


we replace the polynomial g(u) by 4 g(u), where A is some parameter, 
we find: 
|AH —g(A)|=[A—9(A,)] [A—g(AQ)] -° > [A—-9(A,)] . (26) 


This leads to the following theorem. 

THEOREM 3: If Aj, 42,..., An are all the characteristic values (with the 
proper multiplicities) of a watnee A and if g(u) is a scalar polynomial, then 

g(a), g(Az),---, 9 (An) are the characteristic values of g(A). 

In particular, if A has the characteristic values A}, Ao. _ An. then A* has 
the characteristic values ree ri 36 ple @ e——al | a eae i 
3. We shall now derive an effective formula expressing the adjoint matrix 
B(A) in terms of the characteristic polynomial 4(A). 

Let 

A(A)= a" — pa" — pa? — «0 — py. (27) 


The difference 4(4) — A(u) is divisible by 4— mw without remainder 
Therefore 


(A, p) =A Lat 


ar A? + (u® — pype— pg) A> +++ (28) 


+ (2 — Py) 


is a polynomial in 4 and yu. 
The identity 


A (A) — A(u) = 4 (A, 2) (A— 2) (29) 


will still hold if we replace 2 and yu by the permutable matrices AH and A. 
Since by the Hamilton-Cayley Theorem 4(A) =O 


A(A) E= 6 (AE, A) (AH — A). (30) 


Comparing (20’) with (30), we obtain by virtue of the uniqueness of the 
quotient the required formula 


B(A)= 6 (AH, A). (31) 


§ 4. CHARACTERISTIC POLYNOMIAL OF A Matrix. Apsornt Matrix 85 


Hence by (28) 


B(A)= EA’*~) + Ba? + Bow 4 ee + Ba, (32) 
where 
B,=A—p,E, B,=A*—p,A—p,B, 
and, in general, 


B= At— p,At—— p, At? ..-—pEB (k=1,2,...,n—1). (33) 


The matrices B,, Bo,..., Ba, can be computed in succession, starting 
from the recurretce relation 
B,= AB,_,— Pp, (k= 1,2,...,n—1; B,=E£). (34) 
Moreover; 


The relations (34) and (35) follow immediately from (20) if we equate 
the coefficients of equal powers of A on both sides. 
If A is non-singular, then 


?,= (— E)e=? |A FO, 


and it follows from (35) that 


1 
A-1} — Pn n~I1 ° 
Let A, be a characteristic value of A, so that 4(4,) =0. Substituting the 


value A, in (20), we find: 


(36) 


(A,# — A) B(A,)=0. (37) 


Let us assume that B(A,) +O and denote by 5b an arbitrary non-zero 
eolumn of this matrix. Then from (37) we have (A, EZ — A)b =O or 


Ab = A,b. (38) 


Therefore every non-zero column of B(A,) determines a characteristie vector 
corresponding to the characteristic value A,.° 
Thus: 


4From (34) follows (33). If we substitute in (35) the expression for Ba-, given in' 
(33), we obtain A(4)=0. This approach to the Hamilton-Cayley Theorem does not 
require the Generalized Bézout Theorem explicitly, but contains this theorem implicitly. 


5 See Chapter III, § 7. If to the characteristic value \, there correspond d. linearly 
independent characteristic vectors (7 — 4d, is the rank of }\.HE — A), then the rank of 
Bd.) does not exceed dy. In particular, if only one characteristic direction corresponds 
to Ao, then in B(A.) the elements of any two columns are proportional. 


86 IV. CHARACTERISTIC AND MINtmMaL POLYNOMIAL OF A MATRIX 


If the coefficients of the characteristic polynomial are known, then the 
adjbint matrix can be found by formula (31). If the given matrix A ts 
non-singular, then the inverse matriz A-' can be found by formula (36). 
If A. ts a characteristic value of A, then the non-zero columns of B(A,.) are 
characteristic vectors of A for A=dp. 


Example. 
H 2 —1 Jf 
A=|| 0 1 Ji, 
—1 11 
jA—2 1 —1| 
A (A) = |4B—A|=| 0 A—l —1|/—=a?~—4’94 54~2, 
| 1 =] 41 


A) — A 
8 (a, w) = SO A 3s A(u—4) + w®—4y4 5, 


B(A)= 6(AH, A)= BE +. 4(A—4E) 4+ A*9— 444 BE. 
wa eee ee 


B, B, 
But 
=—2 227%: 4 0 2,—2 
B,=A—4E= 0 —3 1 ? B,=—AB,+i5 k= —] 3 —2 » 
ee | 1 —3 1 —1l 2 
A? — 2A —A+2 A—2 
Biay=|} —1 A#—34+3 a—2 |i, 
—A+1 A—1 AP—3A+ 2 
| 0 ] —l] 
1 3 
ai=s 2s pele Og 
2 
a ok 1 
2 2 
Furthermore, 


A(d) = (A—1)2 (A—2). 


The first column of the matrix B(+1) gives the characteristic vector 
(+1, +1, 0) for the characteristic value A= 1. © 

The first column of the matrix B(+2) gives the characteristic vector 
(0, +1, +1) corresponding to the characteristic value 4 = 2. 


§5. Tae MetHop or FApDEEV 87 


§ 5. The Method of Faddeev for the Simultancous Computation of the 
Coefficienis of the Characteristic Polynomial and of the Adjoint Matrix 


1. D. K. Faddeev® has suggested a method for the simultaneous determina- 


tion-of the scalar coefficients p;, po, .-.., Pn of the characteristic polynomial 
4 (A) =A" — pdm! — ph? * — +++ — Dy (39) 
and of the matrix coefficients B,, Bo,..., Ba—, of the adjoint matrix B(A). 


In order to explain the method of Faddeev’ we introduce the concept of 
the trace (or spur) of a matrix. 

By the trace tr A of a matrix A= | Cix \\3 we mean the sum of the diago- 
nal elements of the matrix: 


tr A = J a,. (40) 
im 
It is easy to see that 
trA=p,= DA, (41) 
fal] 
if A), Ao, ..., A, are the characteristic values of A, ie., if 
A (A) = (A—Ay) (A— Ag) «++ (A—A,)- (42) 


Since by Theorem 3 A* has the characteristic values 4}, 4, .--, Ax 
(A =0, 1, 2,...,), we have 


trAt=a,= S'at  (k=0, 1, 2,...). (43) 
i=l 
The sums s, (k=1, 2,...,m) of powers of the roots of the polynomial 
(39) are connected with the coefficients by Newton’s formulas® 
kp, = 8, — 218, 4 —*°°*— Dy18,  (K=1,2,...,%). (44) 
If the traces s;, Se,..., Sn of the matrices A, A?,..., A* are computed, then 
the coefficients pi, po, ..., Dn can be determined from (44). This is the 


method of Leverrier for the determination of the coefficients of the charac- 
teristic polynomial from the traces of the powers of the matrix. 


2. Faddeev has proposed to compute successively, instead of the traces of 
the powers A, A?,..., A”, the traces of eertain other matrices A;, Az,..., An 


6 See [14], p. 160. 

7 In Chapter VII, § 8, we shall discuss another effective method, due to A. N. Krylov, 
of computing the coefficients of the characteristic polynomial. 

8 See, for example, G. Chrystal, Textbook of Algebra, Vol. I, pp. 436ff. 


98 IV. CHARACTERISTIC AND MinimaL PoLyNoMIAL oF A MATRIX 


and so to determine py, p2,-.-. , Pn and B;, Bo,..., By by the following 
formulas : 
1 
A,=AB,, Pe > tr As, B,= A, — p,E 
nigh Oe ee eae ete ai Ws jah tev to? “oe (45) 
A, 1 = AB,_2; Py-1 = 3] tr A,_15 B,-1= An-1 — Pai, 
A,=AB,_1) — tr A,,, B, = A,— P,t =0.- 


The last equation B, = A, — p,H# = O may be used to check the computation. 

In order to convince ourselves that the numbers pi, po,..., Pn and the 
matrices B,;, Bo,..., Bna—1 that are determined successively by (45) are, in 
fact, the coefficients of 4(A) and B(A), we note that the following formulas 
for A; and B, (k=1, 2,..., 7) follow from (45) : 


A, = At —p,At!—.--—p,_,4, B,=A*— p,A*)—+-++—p,_,A—p,B. (48) 


Equating the traces on the left-hand and right-hand sides of the first of these 
formulas, we obtain 


kp = 8 — Py8y_y — °° *° — Pe-191- 


But these formulas coincide with Newton’s formulas (44) by which the 
coefficients of the characteristic polynomial 4(A) are determined succes- 
sively. Therefore the numbers pi, po, ..., Pn determined by (45) are also 
the coefficients of 4(A). But then the second of formulas (46) coincide 
with formulas (33) by which the matrix coefficients By, Bo,..., Bnr—s of the 
adjoint matrix B(A) are determined. Therefore, formulas (45) also deter- 


mine the coefficients B,, Bo,..., Bn—1 of the matrix polynomial B(A). 
Example.® 
2—l1 1 2 —2 —1 1 2 
0 1 1 «90 ie 0 —3 1 0 
A=!_) 3 1 a] m= tra4=4, Be A—-4#=]_ 1 3g 13 
1 31 1 #0 l 1 1 —4 
2 2 4 8 


® As a check on the computation, we write under each matrix A,, 4:2, 4; a row whose 
elements are the sums of the elements above it. The product of this row of ‘column-sums’ 
of the first factor into the columns of the second factor must give the elements of the 
column-sum of the product. 


MInImat PoLYNoMIAL Or Marni. 


|—3 4 0-3 
A= AB = 2 o—2—5 |’ = 5 t4,=—2, B= 4,428 =| o 9 Q-5 
|—3—3-—1 3! SS eee, eee 
_—_5 —1—5—4 
—5 2 0-2 o 3 1 
1 0-2-4/ 1 | 1 52 
Ap AB =|_3 7-3 4 WZ AH —5, BA 6E=| an 
0 4—?~—7 0 4—2~—2' 


—j —] —7 —9 
—2 0 Oo 0 


| 0-2 o o 
A,=—AB,= 0 0 —2 0 ® py ——2. , 


0 0 oO -2 
A (a)= a§— 4924 2394 5442, 


0-1 0.1 
1 5 
ee ee 
|Aja2, Ata tpl 2 2 
Pero SF ey 9 
5 2 
0-2 1 1 


Note. If we wish to determine 7,, po, 03, p, and only the first columns of 
B,, B2, Bs, it is sufficient to compute in A, the elements of the first column 
and only the diagonal elements of the remaining columns, in A, only the 
elements of the first column, and in A, only the first two elements of the 


first column. 


§ 6. The Minimal Polynomial of a Matrix 


1. Derinition 1: A scalar polynomial f(A) 1s called an annihilating poly- 
nomial of the square matriz A tf 


f(A) =O. 


An annihilating polynomial y(A) of least degree with highest coefficient 
1 is called a minimal polynomial of A. 

By the Hamilton-Cayley Theorem the characteristic polynomial 4(A) 
is an annihilating polynomial of A. However, as we shall show below, it 
is not, in general, a minimal polynomial. 

Let us divide an arbitrary annihilating polynomial f(4) by a minimal 
polynomial 
f(A)= (A) (A) +17 (A), 


90 TV. CHARACTERISTIC AND MINIMAL POLYNOMIAL OF A MATRIX 
where the degree of r(A) is less than that of y(4). Hence we heve: 
f(A)= y (A) q(A) +1 (A). 


Since f(4) =O and y(A) =O, it follows that r(4) =O. But the degree 
of r(A) is less than that of the miniinal polynomial w(A). Therefore 
7(A) ==0.2° Hence: Every annihilating polynomial of a matriz is divisible 
without remainder by the minimal polynomial. 

Let yi (A) and we(A) be two minimal polynomials of one and the same 
matrix. Then each is divisible without remainder by the other, i.e., the 
polynomials differ by a constant factor. This constant factor must be 1, 
because the highest coefficients in w,(4) and y2(A) are 1. Thus we have 
proved the uniqueness of the minimal polynomial of a given matrix A. 


2. We shall now derive a formula connecting the minimal polynomial with 
the characteristic polynomial. 

We denote by D,_; (A) the greatest common Qivisor of all the minors of 
order »—1 of the characteristic matrix JE — A, i.e., of all the elements 
of the matrix B(j) = | bi (A) ie (see the preceding section). Then 


B(A)= D,_, (4) C (A), (47) 


where C(A) is a certain polynomial matrix, the ‘reduced’ adjoint matrix 
of AE— A. From (20) and (47) we have: 


A (A) H = (AH — A) C (A) D,_; (A). (48) 
Henee it follows that 4(A) is divisible without remainder by D,_, (A) :? 
A(a 
aT py YA) (49) 


where w(A) is some polynomial. The factor D,_,(A) in (48) may be can- 
celled on both sides ;?? 


y (A) B= (AE — A) C (A). (50) 


10 Otherwise there would exist an annihilating polynomial of degree less than that of 
the minimal polynomial. 


11 We could also verify this immediately by expanding the characteristic determinant 
A(A) with respect to the elements of an arbitrary row. 


12 In this case we have, apart from (50), also the identity (see (20’)) 
y (4) H =C (a) (AH — A), 


ie., C(X) is at one and the same time the left quotient and right quotient of y(A)Z on 
division by AE — 4. 


*§$ 6. Mrymau Potynomrau or Matrix - 91 


Since y(A)E is divisible on the left without remainder by 4H — A, it 
follows by the Generalized Bézout Theorem that 


p(A) =0. 


Thus, ‘the polynomial y(A) defined by (49) is an annihilating polynomial 
of A. let us show that it is the minimal polynomial. 

We denote the minimal polynomial by y*(A). Then w(A) is divisible by 
‘y* (1) without remainder : 


p (A) = p* (A) x (A). (51) 


Since y*(A) =O, by the Generalized Bézout Theorem the matrix polynomial 
w*(A)E is divisible on the left by 4H — A without remainder: 


p* (4) B= (AB—A) O* (A). (52) 
From (51) and (52) it follows that 
y (A) H =(AE — A) C* (A) y (A). _ (53) 


The identities (50) and (53) show that C(A) as well as C*(4) (A) are left 
quotients of w(A)E on division by 4E — A. By the uniqueness of division 


C (A) =C* (A) (A). 


Hence it follows that (A) is a common divisor of all the elements of the 
polynomial matrix C(4). But, on the other hand, the greatest common 
divisor of all the elements of the reduced adjoint matrix C'(A) is equal to 1, 
because the matrix was obtained from B(A) by division by D,_1(4). There- 
fore y(4) =const. Since the highest coefficients of w(d) and w*(A) are 
equal, we have in (51) ¥(4) = 1, 1.e., p(A) = p* (A), and this is what we had 
to prove. 

We have established the following formula for the minimal polynomial: 


A (A) 
¥ (A) rs Dy—1 (4) : 


(54) 


5. For the reduced adjoint matrix C(A) we have a formula analogous to 


(31) (p. 84): 
O (A) =P (AB, A); (55) 


where the polynomial ¥(2, u) is defined by the equation” 
13 Formula (55) can be deduced in the same way as (31). On both sides of the 


dentity (4) —y(“z) =(A—-2) H(A, vw) ~=we substitute for 4 and « the matrices AZ und A 
und compare the matrix equation so obtained with (50). 


92 ITV. CHARACTERISTIC AND MINIMAL POLYNOMIAL OF A MATRIX 


;\ eae 

P(A, p) =2O— ve | (56) 

Moreover, [ 
(AE—A)C(A)= (AE. (57) 


Going over to determinants on both sides of (57), we obtain 
A(a)| Cc (a) | =[¥ (a. (58) | 


Thus, 4(A) is divisible without remainder by w(A) and some power of y(A) 
is divisible without remainder by A(A), i.e., the sets of all the distinct roots 
of the polynomials 4(A) and w(A) are equal In other words: All the distenct 
characteristic values of A are roots of w(A). 


If Y 
A (A) = (A—A,)™ (A—A,)™ +++ (A—A,)™ (59) 
(A, AA, for +47; n;>0, #, f= 1,2,..., 8), 
then 
yp (A= (A—A,)™ (A—Ag)™ © > © (A—A,)™, (60) 
where: 
0<m<sn, (eos ek 58). \61) 


4, We mention one further property of the matrix ((A). Let dA. be an 
arbitrary characteristic value of A= t Cin ie Then w(A,.) ==0 and there- 
fore, by (57), 

(Ag — A) C (4g) =O. (62) 


Note that C(A)) 4 O always holds, for otherwise alk the elements of the 
reduced adjoint matrix C(A) would be divisible without remainder by 4 — Ap, 
and this is impossible. 

We denote by c an arbitrary non-zero column of C(A,)). Then from (62) 


(Ay H — A) c¢—0 9 
1.e., 
Ac=A,¢. (63) 


In other words, every non-zero column of C(A,) (and such a column always 
exists) determines a characteristic vector for A= Ao. 


Example. 
3 —3 2 
AS se] § —21}, 
—l 3 0 
A—3 3 ~—2 
A(4)= Ll Aa--5 = 2||= A’ —8A? + 204 — 16 = (4 — 2)* (4 — 4), 
a: ae 


§ 6. Mintmau PoLynomiau or Matrix 93 


A(u)—A(Aa 
6 (4,4) = ra =H + w(A—8) + 8 —8 4+ 20, 


B (A) =A* + (A—8) A + (2781 +4 20) EB 


| 10 —18 12 3—3 2 1 0 0] 
=|-8 22 —12]14+ (4A—8)]]—1 5 —2!) 4 (42824 20)/|0 1 0 
—6 18 —8 —1 3 o| be ea 
; }A7—52446 —3146 24—4| 
= — Ate? M3442 —22+all. 
| — ade 31—6 rae oe 


All the elements of the matrix B(A) are divisible by Do(A) =A — 2. Cancel- 
ling this factor, we have: 


fy ae ae 2 | 
C(a)= | an ae oe | 225, 
i| —J 3 A—6, 
and 
A (A) 


y(A)= = == (4 — 2) (A— 4). 


In C(A) we substitute for A the value 4, = 2: 


1-1 —3 2 
o@y=|_—_1 1 —2i]. 
a Bal 


The first column gives us the characteristic vector (1,1, 1,) for4,=2. The 
second column gives us the characteristic vector (— 3, 1, 3) for the same 
characteristic value A, =2. The third column is a linear combination of the 
first two. 

Similarly, setting 2,4, we find from the first column of the matrix 
C(4) the characteristic vector (1, —1, —1) corresponding to the charac- 
teristic value A, = 4. 

The reader should note that y(A) and C(A) could have been determined 
by a different method. 

To begin with, let us find Dz2(A).. De(A) can only have 2 and 4 as its 
roots. For A= 4 the second order minor 


1A—5 ; 
Leg ess 


of A(A) does not vanish. Therefore D2(4) 40. For 4=2 the columns of 
A(A) become proportional. Therefore all the minors of order two in 4(A) 


94 IV. CHARACTERISTIC AND MINIMAL POLYNOMIAL OF A MATRIX 


vanish for 4=2 : D2(Z) =0. Since the minor to be computed is of the 
first degree, Do(A) cannot be divisible by (A — 2)”. Therefore 


D, (4) = 4 — 2. 
Hence 
y (4) = 75 = (A—2) (A —4) = B62 + 8, 
—~wyi(a 
vA, p) = PM 9) y+ 1-6, 
Aas. 228 851 


C (a) =y (4k, A) = A+ (A—6)E=|| —1 4—1 —2). 
| _1 03 46 


CHAPTER V 


FUNCTIONS OF MATRICES 


§ 1. Definition of a Function of a Matrix 


1, Let A=|| aq ||f be a square matrix and f(A) a function of a scalar 
argument 4. We wish to define what is to be meant by f(A), i.e., we wish 
to extend the function f(A) to a matrix value of the argument. 

We already know the solution of this problem in the simplest special case 
where f(A) = yoA'+y, 4-1 4+-++++y; is a polynomial in 2. In this ease, 
f(A) =yA'+y, A't4+--++y,H. Starting from this special case, we shall 
obtain a definition of f(A) in the general case. 


We denote by 
y (2) =(A—Aq)™ (A— Ag) (A—A)™ (1) 
tbe minimal polynomial! of A (where 4), j2,..., A, are all the distinct charac- 
teristic values of A). The degree of this polynomial is m =< Mx. 
Let g(A) and h(A) be two polynomials such that 
g(A)= h(A). (2) 


Then the difference d(A) =g(4) —h(A), as an annihilating polynomial for 
A, is divisible by y(A) without remainder; we shall write this as follows: 


g (A) ==h(A) (mod p(A)). (3) 
Hence by (1) 


i d(4,) = 0, d’(A,) = 0, oes a" (2,) =0 (kK =1, 2,..., 8), 
1.e., 


( 
g(Ay)=h(A,), 9’ (Ay)= (Ay), gO (Ay) = AO (A,) 


ee es 58) (4) 


1 See Chapter IV, § 6. 


96 V. Functions or MATRICES 


The m numbers 
HAs PAs coos FP)  (k=1,2,...,8) (5) 


will be called the values of the function f(A) on the spectrum of the matriz A 
and the set of all these values will be denoted symbolically by f(4.). If for 
a function f(A) the values (5) exist (i.e., have meaning), then we shall say 
that the function f(A) 1s defined on the spectrum of the matriz A. 

Equation (4) shows that the polynomials g(A) and h(A) have the same 
values on the spectrum of A. In symbols: : 


9 (A,) =h(A,). 


Our argument is reversible: from (4) follows (3) and therefore (2). 


Thus, given a matrix A, the values of the polynomial g(A) on the Spec- 
trum of A determine the matrix g(A) completely, 1.e., all polynomials g(A) 
that assume the same values on the spectrum of A have one and the same 
matrix value g(A). 

We postulate that the definition of f(A) in the general case be subject 
to the same principle: The values of the function f(A) on the spectrum of 
the matriz A must determine f(A) completely, 1.e., ail functions f(A) having 
the same values on the spectrum of A must have the same matrix value f(A). 

But then it is obvious that for the general definition of f(A) it is suffi- 
cient to look for a polynomial? g(A) that assumes the same values on the 
spectrum of A as f(A) does and to set: 


f(A)= (A). 


We are thus led to the following definition: 
DEFINITION 1: If the function f(A) is defined on the spectrum of the 
matrix A, then . 
{(A,) = g (A4) 5) 
where g(4) 1s an arbitrary polynomial that assumes on the spectrum of A 
the same values as does f(A) : 


f(A) =g9(A). 


Among all the polynomials with complex coefficients that assume on the 
spectrum of A the same values as f(A) there is one and only one polynomial 


2 It will be proved in § 2 that such an interpolation polynomial] always exists and an 
algorithm for the computation of the coefficients of the interpolation polynomial of least 
degree will be given. 


§ 1. DEFINITION OF FUNCTION or MATRIX 97 


r(A) that is of degree less than m.? {iis polynumial r(A) is uniquely deter- 
mined by the interpolation conditions: 


r(Ay=H(A), VP (AJ= Kays -.6. oO? a) =f? (Ay) 


6 
‘(k=1, 2,..., 8). 0) 


The polynomial r(A) is called the Lagrange-Sylvester interpolation poly- 
nomial for f(2) on the spectrum of A. Definition 1 can also be formulated 
as follows: 


DEFINITION 1’: Let f(A) be a function defined on the spectrum of a 
matriz A and r(A) the corresponding Lagrange-Sylvester interpolation poly- 
nomial. Then 


{(A)= r(A). 


Note. If the minimal polynomial y(/) of a matrix A has no multiple 
rootst (in (1) m=m2.=...=m,=1; s=m), then for f(A) to have a 
meaning it is sufficient that f(A) be defined at the characteristic values 
Ai, Ae, ..., Am. But if y(A) has multiple roots, then for some characteristic 
values the derivatives of f(A) up to a certain order (see (6) ) must be’defined 
as well. . 

Example 1: Let us consider the matrix® 


n 
en 
10 1 0 0 
0 0 1 0 

05 Sm ee 
0 0 0 1 
10 0 0 0 


Its minimal polynomial is 4". Therefore the values of f(A) on the spec- 
trum of H are the numbers f(0), f’(0),..., f{*~? (0), and the polynomial 
r(A) is of the form 


r(ay= 70) + Oa go. + 


pe” (0) qn-t 
7 ee 2 


(n — 1)! 
Therefore 


3 This polynomial is obtained from any other polynomial having the same spectral 
values by taking the remainder on division by w(A) of that polynomial. 


4 In Chapter VI it will be shown that A is a matrix of simple structure (see Chapter 
III, § 8) in this case, and this case only. 


5 The properties of the matrix H were worked out in the example on pp. 13-14. 


98 V. Functions or MATRICES 


Heo) LO. ~  fe-V@) 
[is amt 
) NO ci Ns” 
pay = 108+ OO H+ + Rated | | 
: r (0) 
I a 
FO -O. . 1. fo | 
Example 2: Let us consider the matrix 
n 
\4, 1 0...0] 
10 a4 1... .01 
| ee. 
000...1 
lo 0 0...4,|| 


Note that J=4,H + A, so that J —A,E =H. The minimal polynomial 
of J is clearly (A —A,)*. The interpolation polynomial r(A) of f(A) is given 
by the equation 


(n—1) 
r(a)= f (ay) mal (4a) + gel Ue) yon 


(n — 1)! 
Therefore 

N= rtp + be) f"—) (4) a 
f[N=rWV)=fA jE + H+ +++ + amie E 1 

f (Ao) f"— (45) 

f (Ao) Th . . et | 

ee ; 
f’ (Ao) 
. I! 
oe Me f(a) | 


2. We mention two properties of functions of matrices. 
1. If two matrices A and B are similar and T transforms A into B, 
B=T"'AT, 


then the matrices f(A) and f(B) are also similar and T transforms f(A) 


into f(B), 
f(B) =T-—f(A)T. 


§ 1. DEFINITION oF FUNCTION oF MATRIX 99 


For two similar matrices have equal minimal polynomials,® so that f(A) 
assumes the same values on the spectrum of A and of B. Therefore there 
exists an interpolation polynomial r(4) such that f(A)=r(A) and 
f(B) =r(b). But then it follows® from the equation r(B) =T-'r(A)T 
that 

f(B) =T-1f(A)T. 


2. If Ais a quasi-diagonal matrix 


A= {A,, Ay, ...,A,}; 
then 
f(A)= (f (A,), f (Ag), ces t(A,)} ° 


Let us denote by r(A) the Lagrange-Sylvester interpolation polynomial 
of f(A) on the spectrum of A. Then it is easy to see that 


f(A) =r(A) = {rtA,), r (Ag), .--, 7 (A,)}- (7) 


On the other hand, the minimal polynomial py(A) of A is an annihilating 
polynomial for each of the matrices A, Ao,..., An. Therefore it follows 
from the equation 


f(A4)=7 (Ay) 
{(Aa)= 7(Aa,), -- +) f(Aa,) =7 (Ay). 


that 


Therefore 
{(A,)=1(A,), ..- f(A.) =7(A,), 


and equation (7) can be written as follows: 


{(A)= {f (A,), f (Ay), oaey f(A,)} ° (8) 


Example 1: If the matrix A is of simple structure 


AST 266 AST, 
then 
f(A) == Tf (Ay)s f (Aa), «00 F(An)} T™. 


f(A) bas meaning if the function f(A) is defined at Aj, Az, ..-, An. 


— 


6 From B= T—'AT it follows that Bk == T-1A'T (k= 0,1,2,...). Hence for every 
polynomial g(A) we have g(B) = T-19(A)T . Therefore it follows from g(4) =O that 
g(B) =O, and vice versa. 


100 V. Functions oF MATRICES 


Example 2: Let J be a matrix of the following quasi-diagonal form 


ee ee oe ee e© ee 8 e# 


0 0 0...4 1 
0 0 0...0 a 


All the elements in the non-diagonal blocks are zero. By (8) (see also the 
example on pp. 12-13), 


Yay... fr May} 
f (ax) ue eee sie 
0. f(a) 
: | 
f’ (Ay) | 
1! | 
ne eR at ee 
f\J) = 
itenimasaniien ee 
f (Ay) 1! (vy, —1)! 
0 (fay) 
f’ (au) 
1]! 


0 O . 2... f(y) 


Here, as in the matrix J, all the elements in the non-diagonal blocks are also 
zero.’ 


7 It will be established later (Chapter VI, § 6 ot Chapter VII, § 7) that an arbitrary 
matrix A = || ac: ||} is always similar to some matrix of the form J : A= TJ T—1. There- 
fore (see 1, on p. 98) we always have f(A) = Tf(J)T—}. 


§ 2. La@RANGE-SYLVESTER INTERPOLATION POLYNOMIAL 101 


§ 2. The Lagrange-Sylvester Interpolation Polynomial 


1. To begin with, we consider the case in which the characteristic equation 
| AZ — A | =0 has no multiple roots. The roots of this equation—the char- 
acteristic values of the matrix A—will be denoted by (;, 42, ...,An. Then 


wy (A) = | AE — A|= (A—A,) (A—A,)--- (A—A,), 
and condition (6) can be written as follows: 


r (A,) =f (A;) (k=1, 2, ite J m). 


In this case, r(A) is the ordinary Lagrange interpolation polynomial for 
the function f(A) at the points 4;, Ae, . fo An 


(A —A,) +++ (A —Ag_y) (A — fee): 
ots > A Fea rere ney ean) aes aE os * f (as). 


By Definition 1’ 


_ — yr(4 — AF) +++ (4 — Ap_H) (A — Ag 1B) +++ (A — gE) 
tal a Gay ef mes Vaiss FO » (Ag — an) f(A). 


2. Let us assume now that the characteristic polynomial has multiple roots, 
but that the minimal polynomial, which is a divisor of the characteristic 
polynomial, has only simple roots :® 


p (A) = (4 —A,) (A— Ag) + (A—A,,). 


In this case (as in the preceding one) all the exponents m, in (1) are 
equal to 1, and the equation (6) takes the form 
r (A,) =f (A,) (A =1, 2, --+, ™). 
r(A) is again the ordinary Lagrange interpolation polynomial and 


_ Wi(4 — A,B) +++ (A — Ag BR) (A — Igy 1B) +++ (A — An) 
f(A) =) (Ag —A,) >> » (Ay — Ag ) i ee Se oe p= 1, me f(A) - 


3. We now consider the general case: 
p (A) =(A—Ay)™ (A— Ag+ 6 (A—A,)™ (my + Mg + +++ + Mg= Mm). 


We represent the rational function aes + where the degree of r(A) is less 


than the degree of y(A), as a sum of partial fractions: 


8 See footnote 4. 


102 V. Functions or MATRICES 


r (a) “E1 “Ee oem 
= + 7 + a oat + Pascal 8 ; 9 
y (4) S| (A — a,)™* (A—a,)™? A—A, ( ) 
where a;,; (= 1 2,...,m,;k=1,2,....8) are certain constants. 


In order to determine the numerators a,; of the partial fractions we 
multiply both sides of (9) by (A — 4,)™# and denote by y;(4) the polynomial 


aa . Then we obtain: 
ry a—a eae 
yr (4) Ht t ee | = et +++ + Gem, ( —A,) + 


+(A—Ajy™0,(A) (R= 1,2,...,8), (10) 


where o;(A) is a rational function, regular for 4 = A,.° 


Hence 
(A) 
Pe Fars we hana,” 
r(a) S. (11) 
2 = eee” ra) Emap las FO aca Se aiseney 


Formulas (11) show that the numerators a,; on the right-hand side of 
(9) are expressible in terms of the values of the polynomial r(A) on the 
spectrum of A, and these values are known : they are equal to the correspond- 
ing values of the function f(A) and its derivatives. Therefore 


: L(g) = , 1 
= say te ONL TE aay ) 
(1 2 5 8); 
Formulas (12) may be abbreviated as follows: 
1 f(a) qe-» — 5 ties 
O4=G—AT [walla a, Gj=1,2,...,m; B=1,2,...,8). (13) 


When all the a;; have been found, we can determine r(A) from the follow- 
ing formula, which is obtained from (9) by multiplying both sides by w(A) : 


r (A) = J las t+ ayg(A—A) +0 + Oy m, (A— Ay)" (A). (14) 
kal 


In this formula the expression in brackets that multiphes y,(/) is by 
(13), equal to the sum of the first m, terms of the Taylor expansion of f (,) 


in powers of (A — Ax). 


® T.e., that does not become infinite for A= Aye 


§ 2. LAGRANGE-SYLVESTER INTERPOLATION POLYNOMIAL 103 


Note. The Lagrange-Sylvester interpolation polynomial can be obtained 
by a limiting process from the Lagrange interpolation polynomial. 
Let 


$ 
y (A) =(A—A,)™ (A—A_)™ «+ (A—A)™ (m= =m). 
We denote the Lagrange interpolation polynomial constructed for the m 


points 
1 2) ) 1) 4(2) (ms) . » 9) - 9 (2). 
A : A ee 69 ra 5 as +) yes 9 2 © of As 3 ee eg rk 3 rhs ). ee eg Fes 


ne ca ane Sige errre my sl 
La Hae sae .vace ay. »..< ae i) 
GOV cs FOC): TOO ead Ge); 


Then it is not difficult to show that the required Lagrange-Sylvester 
polynomial is determined by the formula 


by 


r(a) = lm L(A). 


eoeeewnewnevevr er eve 


Example : 
y (4) =(A— Ay)? (A—4p)® (m= 5). 
Then 
r r(a) o B 6 
ei) Gay i= ay pe Ws (1, ae 


Hence 
r (A) =[a + B(A—A,)]} (A — Aq)® + [y + 6 (4 — Ae) + € (A — Ag)?] (4 — a)? 
and therefore 


r(A)= [aH + B(A—A,E)](A —A,H)* + [vB + 6 (A —A,E) + € (A —4,B)"] (A — AE). 


a, 8, y, 6, and € can be found from the following formulas: 


f (Ay) 


3 
ee : =a Ax), 
“=H? Po —T, pet + ay aye T 1% 
f (Az) = 2 
Mernd a aeeei: (Ay — Tiga (Ay), 


3 _2 ” 
= Baal d—7 asl + 5 gaat (42). 


104 V. Functions or Matrices 


§ 3. Other Forms of the Definition of f(A). 
The Components of the Matrix A 


1. Let us return to the formula (14) for r(A). When we substitute in (14) 
the expressions (12) for the coefficients a and combine the terms that con- 
tain one and the same value of the function f(A) or of one of its derivatives, 
we represent r(/) in the form 


a =< If (Az) Gey (A) + f’ (Ay) Pro (A) + a + fm) (2,) HY, m()]. (15) 


Here g,;(2) (j= 1, 2,..., me; k=1, 2,..., 5) are easily computable poly- 
nomials in J of degree less than m. These polynomials are completely deter- 
mined when w(/) is given and do not depend on the choice of the function 
f(A). The number of these polynomials is equal to the number of values of 
the function f(A) on the spectrum of A, i.e., equal to m (m is the degree of 
the minimal polynomial y(A)). The functions ¢x;(A) represent the 
Lagrange-Sylvester interpolation polynomial for the function whose values 
on the spectrum of A are all equal to zero with the exception of /() (1,), 
which is equal to 1. 

All the polynomials g;;(A) (7 =1, 2...., maj k=1, 2,...,8) are linearly 
independent. For suppose that 


8 Mm: 
2. ~ CEP ij (A) = 0. 


k=l 


Let us determine the interpolation polynomial r(A) from the m 
conditions : 


1 (Ay) = = (F=1,2, ..., ms k=1,2,..., 8). (16 
Then by (15) and (16) oo 
1 (2) = Os Cy Pry (A) =0 
and, therefore, by (16) 
CFy¥=0 (j= 1,2, ..., m; &=1,2, ..., 8). 


From (15) we deduce the fundamental formula for f(A): 


f(A) = 2 fi (Ap) Zis +f (Ay) Zyg tee +f” (Ay) Zim,| (17) 


where 
Ly = Py (A) j= 1, 2, eoey My; k=1,2, anaes 8). (18) 


§ 3. OTHER Forms or DEFINITION oR f(A). COMPONENTS 105 


The matrices Z;; are completely determined when A is given and do not 
depend on the choice of the function f(A). On the right-hand side of (17) 
the fungtion f(A) is represented only by its values on the spectrum of A. 
The matrices Z,; (j= 1, 2,..., m,; k=1, 2,..., 8) will be called the 
constituent matrices or components of the given matrix A. 
‘The components Z,; are linearly independent. 
For suppose that 


2 > Cty =O. 


kml jel 
Then by (18) 
x (A) =O, (19) 
where 
8 me 
x, (A) oP Cup rj (4). (20) 


Since by (20) the degree of y(A) is less than m, the degree of the minimal 
polynomial w(A), it follows from (19) that 


4 (4) = 0. 


But then, since the m functions q;;(A) are linearly independent, (20) implies 
that 
C,, =O (7 =1, 2, ..., m; K=1, 2, ..., 8) 


and this is what we had to prove. 


Z. From the linear independence of the constituent matrices Z;,; it follows, 
among other things, that none of these matrices can be zero. Let us also 
note that any two components Z,; are permutable among each other and 
with A, because they are all scalar polynomials in A. 

The formula (17) for f(A) is particularly convenient to use when it is 
necessary to deal with several functions of one and the same matrix A, or 
when the function f(A) depends not only on A, but also on some parameter f. 
In the latter case, the components Z;; on the right-hand side of (17) do not 
depend on ¢, and the parameter ¢ enters only into the scalar coefficients of 
the matrices. ) 

In the example at the end of § 2, where w(d) = (A — A,)?(A — Az2)3, we 
may represent r(A) in the form 


(A) =F (4x) pur (A) + Ff (Ar) Gia (A) + F(a) Gan (A) +f (Ag) poe (4) + £7 (Aa) as (A), 


where 


106 V. Functions or MATRICES 


_ pa—a,\3 3(A—A,) _ (4-4) (A—A,)* 
vu (4) = (<=) Se, Pia (4)= (4, — 4, ’ 


_([4-A, 2(A—A,) | 3(A—A,)? 
pn (4) = (754) fb Ay — Ay = at 


(A —A,)? (A—A,) fy — 2022, 


i aaa PSY 0 ki 
(4 —A,)? (A —2,)8 
0) Say 


Therefore 


f(A)= F (Ay) Zan + ff (Ar) Zag + f (Ag) Zar + (Ae) Zoe +f” (Az) Zes 
where 


] 3 
2 = 9 (A) = Aas (A — 4,E)° [z As, es 1,5)| ‘ 


1 
Z1.= $12 (A) = (4; — ay)® (A —A,E) (A —A,E), .... 
3. When the matrix A is given and its components have actually to be 
found, we can set in the fundamental formula (17) /(«)= = , where A 
is a parameter. Then we obtain 
CA) | Zn 1! Ze (my — 1)! Zam, 


(AE — Ay = 


yay era awe 8 ae i 
where ((A) is the reduced adjoint matrix of 24£ — 4 (Chapter IV. € 6).?° 

The matrices (7 — 1)! Z;, are the numerators of the partial fractions in 
the decomposition (21), and by analogy with (9) they may be expressed by 
the values of C(A) on the spectrum of A by formulas similar to (11): 


C (Ax) cay 7 
— Ty! — — — 9\1 — Q 
(m, 1)! Zim, = Wr (A) ’ (m, 2)! Zy, mpml =| Wy aber ’ 
Hence 
= ] C (a) (m).—-3) Cae ; _— 
445=G—iim op)! =a as (= 1, 2, ...,m; K=1, 2, ..., 8). (22) 


When we replace the constituent matrices in (17) by their expressions (22), 
we can represent the fundamental formula (17) in the form 


10 Por f{ (42) mee ase we have f(A) =(AB— A)"!. For f(4)=1(A), where r(x) is 


the Lagrange-Sylvester interpolation polynomial. From the fact that f(u) and r(y) 
coincide on the spectrum of A it follows that (A— uw) r(u) and (A— x) f (u) =] coincide 
on this spectrum. Hence (AH— A)r(A)= (AE— A) f(AD=E. 


§ 3. Oruer Forms or DEFINITION OR f[(.4). COMPONENTS 107 


-y__!} 0 (a) m7) 
[Al 2 =i Slew !], Ras 
Example 1 :" 
2 —1 142 A—2 1 
A=] o 1ifle , a 0 a—l 
r—1 .1 alg L =! 


In this case A(A) = | AE — A | = (A—1)?(A— 2). Since the minor of 
the element in the first row and second column of AE — A is equal to 1, we 


have D2(A) =1 and, therefore, 
p (A) = A (A) = (4 — 1)? (A+ 2) = 48 — 449 4 BA—2, 
¥ (a, p= LO— VO — ad was 


and 


CO (A) = W (AB, A) = AP + (A—4) A+ (2 — 404.5) EB 


3 2 213 | 1 1 100 
-|-: 2 23 +(a—4) || o 1 5 eroeret 010 
—3 38 11/2 j—1 2 1] 001 
The fundamental formula has in this case the form 
f(A)=F (D2 + f (1) Ze + (2) Ze; - (24) 
: 1 
Setting f(/) ae arr , we find: 
1 O(a) 2 Z Z, 
\ ay gay Aa G@—1F + 72? 
hence 
Zy,= —C(1)— Cc’ (1), Z,,=—C (1) 9 Zo} ==C (2) . 


We now use the above expression for C(A), compute 231, Zs, Z2;, and substi- 


tute the results obtained in (24) - 


11 The elements of the sum column are printed in italics and are used for checking 
the computation. When we multiply the rows of 4 into the sum column of B we obtain 


the sum column of AB. 


108 V. Functions or MATRICES 


1 0 0f ee | 00 0 
jay=10)] 0 O+P/ (1) rf = i} +1 | - 1 1 ll 
li 4 10 | |—1 1 0! 
# (1) + #’ (1) —fQ) f¢(d 
AU+PQ)—F(2) —f (+ F(2) fd) |. (25) 
f (1) — f (2) —f(I+F(2) f (I)! 


Example 2: Let us show.that we can determine f(A) starting only from 
the fundamental formula. Again let ) 


2—1 1 || 
| 0 1 al y (A) == (A— 1)? (A— 2). 
| eee tee || 
Then 
f(A) =F (1) 2, + f' (1) Ze + f(2) Ze. ' (24’) 


In (24’) we substitute for f(A) in succession 1, i — 1, (A—1)?: 


11 0 0 
Z,+Z,=E=\|0 1 Off, 
001 
( 2k wi 
ZA+Z,=A—E=|| 0 0 17, 
i—1 1 ollo 
0 0 O10 
r= (m= | 1 1 ollo. 
1 1 Ofjo 


Computing the third equation from the first two term by term, we can 
determine all the Z. Substituting in (24’), we obtain the expression for 
f(A). 

4. The examples we have analyzed illustrate three methods of practical 
computation of f(A). In the first method, we found the interpolation poly- 
nomial r(A} and put f(A) =r(A). In the second method, we made use of 
the decomposition (21) and expressed the components Z;; in (17) by the 
values of the reduced adjoint matrix C(A) on the spectrum of A. In the 
third method, we started from the fundamental formula (17) and substituted 
in suceession certain simple polynomials for f(A) ; from the linear equations 
so obtained we determined the constituent matrices Z,,;. 


§ 3. OTHER Forms or DEFINITION oR f(A). COMPONENTS 108 


The third method is perhaps the most convenient for practical purposes. 
In the general case it can be stated as follows: 

In (17) we substitute for f(A) successively certain polynomials g,(A), 
g2(A), eeey gm(A) : 


9A) = 3 [Ode Zan + (As) Zan + -*+ +f) Za 
(¢=1,2,...,m). (26) 


From the m equations (26) we determine the matrices Z;; and substitute the 


expressions so. obtained in (17). 
The result of eliminating Z,; from the (m+ 1) equations (26) and (17) 


can be written in the form 


f(A) (dq) v0 PMP Ag) oe FA) = OY (AY) 


g,(A) 94 (Ay). gt (Ay) 9 (A) A? (A) 
, ‘ =). 


Jm(A) mAs) «= 9? (Ay) Im (As) © = So (Ay) 


Expanding this determinant with respect to the elements of the first column, 
we obtain the required expression for f(A). As the factor of f(A) we have 
here the determinant 4 =| g (A,) | (in the 7-th row of A there are found 
the values of the polynomial g;(4) on the spectrum of A;71=1, 2,...,m). 
In order to determine f(A) we must have 4 0. This will be so if no linear 
eombination’? of the polynomials vanishes completely on the spectrum of A, 


i.e., is divisible by w(A). 
The condition 4 ~0 is always satisfied when the degrees of the poly- 


nomial gi(A), go(A),--., Gm(A) are 0, 1,...,m— 1, respectively.’ 


5. In conclusion, we mention that high powers of a matrix A” can be con- 
veniently computed by formula (17) by setting f(A) equal to 4*."* 


: P §—4]|}.. ‘ 
Example: Given the matrix A= | 4—3 it is required to compute the 


elements of A?°. The minimal polynomial of the matrix is y(4) = (A — 1)?. 


12 With coefficients not all equal to zero. 
13 In the last example, m= 3, g,(4) =1, g,(4)=A— 1, 99(A) = (A— 1). 
14 Formula (17) may also be used to compute the inverse matrix A~?!, by setting 


l “ 
(a= To what is the same, by setting A= 0 in (21). 


110 V. -Functions or MATRICES 
The fundamental formula is . 
{(A) = f(1)Z, + (1) Z,. 
Replacing f(A) successively by 1 and 4 — 1, we obtain: 


7,222, Z2,—A—E. 
Therefc 
erefore f(A)=f() B+ (1) (4 —£). 


Setting f(A) =A”, we find 


ap | 0|| 4 —4 401 — 400 
= A—E) = = ‘ 
aman rod—B)=|5 [+ 100[/ “=| M90 — 00 


§ 4. Representation of Functions of Matrices by means of Series 


1. Let A= || ex ||: be a matrix with the minimal polynomial (1) : 
y(a)= (A= Ay (A— day oes (A— A, (m =m). 


Furthermore, let f(A) be a function and let f,(A), fo(A), ..., fp(A), ... bea 
sequence of functions defined on the spectrum of A. 


We shall say that the sequence of functions f(A) converges for p— 
to some limit on the spectrum of A if the limits 


lim f,(A,), limf,(A,), ..., limff*?(A,) (= 1, 2,..., 8) 
P—r0o p00 p-%0o 


exist. 
We shall say that the sequence of functions fy(A) converges for p— « 
to the function f(A) on the spectrum of A, and we shall write 


lim f,(44)= (Ag) 
if ~ 
Bass lp (A,) =f (Ax), jim fAd=f (a). --- lim Oe AD 
(eS 1). 2p kw S$) 
The fundamental formula 


$A) = [Fda Za + (Ae) Zan toe + FY (Ay) Zim 


expresses f(A) in terms of the values of f(A) on the spectrum of A. If we 
regard the matrix as a vector in a space R,;: of dimension n?, then it follows 
from the fundamental formula, by the linear independence of the matrices 
Zyj, that all the f(A) (for given A) form an m-dimensional subspace of R;' 


§ 4. REPRESENTATION OF FUNCTIONS or Matrices sy Serres 111 


with basis Z;; (7= 1, 2,..., m,;k=1, 2,...,8). In this basis the ‘vector’ 
f(A) has as its coordinates the m values of the function f(A) on the spec- 
trum of A. 

These considerations make the following theorem perfectly obvious: 


- THEOREM 1: A sequence of matrices f,(A) converges for p— © ve some 
limit af and only tf the sequence fy(A) converges for p— o onthe spectrum 
of A toa limit, 2.¢., the limits 


lim f,(A) and lim f,(A,) 
D> OO poco 
always exist simultaneously. Moreover, the equation 


dim f,(44)= f(A,) (27) 
amplies that . 
oe f(A) =f (A) - (28) 
end conversely. 


Proof. 1) If the values of f,(A) converge on the spectrum of A for 
p — cc to limit values, then from the formulas 


f, (A) = 3 0 v) Fir + ty (Ae) Zen toes the (A) Zam} (29) 


there follows tne existence of the limit lim f,(A). On the basis of this 
p-» Co 


formula and of (17) we deduce (28) from (27). 
2) Suppose, conversely, that lim f,(A) exists. Since the m constituent 
p— co 


matrices Z are linearly independent, we can express, by (ZY), the m values 
of f,(A) on the spectrum of A (as a linear form) by the m elements of the 
matrix f,(A). Hence the existence of the pent him m fp (A, ) follows, and (27) 


holds in the presence of (28). 


oe to this theorein, if a sequence of pu1ynomials g,(A) fee = 1, 2, 
..) converges to the function f(A) on the spectrum of A, then 


— 9, (A) = f(A). 


2. This formula underlines the naturainess and generality of our definition 
of f(A). f(A) is always obtained from the g,(A) by passing to the limit 
p—> o, provided only that the sequence of polynomials g,(2) converges to 
f(A) on the spectrum of A. The latter condition is necessary for the exist- 
ence of the Aimit im Jp(A). 


112 -V. Functions or Matrices 


We shall say that the series »' uy(A) converges on the spectrum of A 
p=0 


to the function f(A) and we shall write 
{(A,) = 2 Up (A,), (30) 
px 


if all the functions occurring here are defined on the spectrum of A and the 
following equations hold: 


f (Ay) = DS" u, (Ax), ff (Ay) = 2 ui, 9 eee pom (Ax) ee i (A,) 
p=0 p=0 ys p=0 
(k=1,2,...,8), 


where the series on the right-hand sides of these equations converge. In 
other words, if we set 


P 
5, (A) =D) u, (A) (p= 0, 1, 2, Rady 
q=0 


then (30) is equivalent to 
f (A4) =lim s, (A,). (31) 
p +00 


It is obvious that the theorem just proved can be stated in the following 
equivalent form : 


THEOREM 1’: The series >) Up(A) converges toa matriz if and only if 
p=0 


foe] 


the series >; up(A) converges on the spectrum of A. Moreover, the equation 
p=0 


{(A,)= Pa Uy (A,) 
p=0 
umplies that 


f(A) =u, (A), 
p=0 
and conversely. 


3. Suppose a power series is given with the circle of convergence |A—A,)| < RB 
and the sum f(A): 


f (A) a Oy (A—Ap)?. ([A—Ag| < R). (32) 


§ 4. REPRESENTATION OF FUNCTIONS OF MaTRICES BY SERIES 113 


Since a power series may be differentiated term by term any number of 
times within the circle of convergence, (32) converges on the spectrum of 
any matrix whose characteristic values lie within the circle of convergence. 

Thus we have: 


- TuHeEorREM 2: If the function f(A) can be expanded in a power series in 
the cirele |A—A.| <1, 


f(a) = Pa a, (A—Ag)?, (33) 
p= 


then this expansion remains valid when the scalar argument A is replaced 
by a matriz A whose characteristic values lie within the circle of convergence. 


Note. In this theorem we may allow a characteristic value A; of A to 
fall on the circumference of the circle of convergence; but we must then 
postulate in addition that the series (33), differentiated m,— 1 times term 
by term, should converge at the point A=2,. It is well known that this 
already implies the convergence of the j times differentiated series (33) 
at the point 4, to {/(4,) for 7 =0,1,...,m,—1. 

The theorem just proved leads, for example, to the following expansions :1° 


1)? AzPt!l 
at cos = 5 A??, sin A= =S(-Wap pi 


AP . CO Ap +1 
coo A= Say sinh A =a Op thi 


(B— Ay = 2 AP (Agel Vee) 2) kas) 


InA = = en: Cae al 


p=l 


(by In A we mean here the so-called principal value of the many-valued 
function Ln A, i.e., that branch for which Ln1=0). 
Let G(u, u2,..., ur) be a polynomial in u, we, ..., wi; let f,(A), f.(A), 
, f,(4) be functions of A defined on the spectrum of the matrix A, and let 


g (A) =G [f, (A), fa (A), ~~ +> fy (A))- 


Then from 
g(A4) =0 (34) 


G4 [fi (A), f(A), -- +» f(A) = 9. (35) 


there follows: 


15 The expansions in the first two rows hold for an arbitrary matrix 4. 


114 V. Functions or MATRICES 


For let us denote by f, (4), fo(4),..., f,(A) the Lagrange-Sylvester inter- 
polation polynomials for-r, (4), 72(4), ... ,7,(4), and let us set: 

G [f, (A), fe (A), .- +f (A)] =@ ir, (A), 72 (A), -.. (A= A (AD=O, 
Then (34) implies 


; h(A)= G [ry (A), 79 (A), «(ANI 
Hence it follows that 


h (Aa) =0. "> (36) 


and this is what we had to show. 
This result allows us to extend identities between functions of a scalar 
variable to matrix values of the argument. 


For example, from 
cos* A + sin? A =1 


we obtain tor an arbitrary matrix A 
cos? A + un’ .— 2 


(in this case G (u,, u,) = wi + uv? -- 1, f,(A) = cos 2, and f,(4)= sin 4). 
Similarly, for every matrix A 


i.e., ; 
o4 = ( ey? 
Further, for every matrix A 
e'“—cos A + isin A 
Let A be a non-singular matrix (| A | 40). We denote by ya the single- 


valued branch of the many-valued function 4 that is defined in a domain 
uot containing the origin and containing all the characteristic values of A. 


Then /A has a meaning. From(y/)"— 2=0 it now follows that 


(V4) =A. 
Let f(A) => and let A= | Dik \\? be a non-singular matrix. Then f(A) 


is defined as the spectrum of A, and in the equation. 


Af(a) = 1 
we can therefore replace A by A: 
i.e.,28 A Fi (A) =E, 
f(A) =A. 


Denoting by r(A) the interpolation polynomial for the function 1/4 we 
may represent the inverse matrix A? in the form of a polynomial in A: 


16 We have already made use of this on p. 108. See footnote 10. 


§ 4. REPRESENTATION OF FUNCTIONS OF MatTRICES BY SrErieS 115 

A-\=r (A). 
Let us consider a rational function o(Ay= a where g(A) and h(A) are 
co-prime polynomials in A. This function is defined on the spectrum of A 


if and only if the characteristic values of A are not roots of A(A), i.e.,'? if 
|h( A) | 540. Under this assumption we may replace 4 by A in the identity 


e(A)h(Ay=g(A), 
obtaining : 
@ (A)h (A) =g (A). 
Hence 
@ (A) = g (A) [2 (A)1* = [A (A)]* gg (A). (37) 


Notes. 1) If A is a linear operator in an n-dimensional space R, then 
f(A) is defined exactly like f(A): 


f(A) =r(A), 


where r(A) is the Layrange-Sylvester interpolation polynomial for f(A) on 
the spectrum of the operator A (the spectrum of A is determined by the 
minimal annihilating polynomial w(A) of A). 

According to this definition, if the matrix A= | dix | corresponds to 
the operator A in some basis of the space, then in the same basis the matrix 
f(A) corresponds to the operator f(A). Ali the statements of this chapter 
in which there occurs a matrix A remain valid after replacement of the 
matrix A by the operator A. 


2) We can also define’® a function of a matrix f(A) starting from the 
characteristic polynomial 


A (A)= TT (A—Ay)"* 
k=1 


instead of the minimal polynomial 


y (A)== TT (A— ay. 
kw 


a 


17 See (25) on p. 84. 
18 See, for example, MacMillan, W. D., Dynamics of Rigid Bodies (New York, 1936). 


116 V. Functions or MATRICES 


We have then to set f(A) =g9(A), where g(A) 1s an interpolation polynomial 
of degree less than m modulo 4(A) of the function f(4).?° The formulas (17), 
(21), and (23) are to be replaced by the following” 


f(A) = [Pf Aw) Zia Ff (Aa) Zaz tee + POY (Ae) Zeng] (17’) 
k=1 
Ba ‘f Z, 1!Z,, —l!Zp, 
(AE — A)? = mia — pce aed + et + (ne ) k k : (21’) 
A (A) ka1 [A—Ay (A — A,)? (A —a,)"*1 
fl TBA) (re) 
1A) = 2 aT say! Ong (23") 
where ve 
Athi) ——S—., RST, 2p eg): 
x (A) a hy) ( 8) 
However, in (17’) the values f(™) (Ay), fO™+) (Ax), ..., f- (Ax) occur 


only fictitiously, because a comparison of (21) with (21’) yields: 
ie =Z2r1, Sey Zum, Zim, +. 1—-e- =Zix, ==): 
§ 5. Application of a Function of a Matrix to the Integration of a System 
of Linear Differential Equations with Constant Coefficients 


1. We begin by considering a system of homogeneous linear differential 
equations of the first order with constant coefficients: 


dx 
eo Mart + AaB tos + Ay 
dx, 
Fe F21%1 + Vep%q t+ + + on%y (38) 
din _ 
“dt nity + AqoTe + °°? 1 AnnXy> 
where ¢ is the independent variable, x, 22, ..., Zn are unknown functions of 
t, and ay. (1,4 =1, 2,...,m) are complex numbers. 
We introduce the square matrix A= || au ||t of the coefficients and the 
eolumn matrix x= (21, Zo, ..., Zn). Then the system (38) can be written 


in the form of a single matrix differential equation 


18 The polynomial g(A) is not uniquely determined by the equation f(4) =g(A) and 
the condition ‘degree less than n.’ 

20 The special case of (23’) in which f(1)== 4" is sometimes called Perron’s formula 
(see [40], pp. 25-27). 


§ 5. APPLICATIONS To SYSTEM oF LINEAR DIFFERENTIAL Equations 117 
——Ar. (39) 


Here, and in what follows, we mean by the derivative of a matrix that 
matrix which is obtained from the given one by replacing all its elements 


by their derivatives. Therefore ee is the column matrix with the elements 
dt’ dt’"°*? dt° 
We shall seek a solution of the system of differential equations satisfying 
the following initial conditions: 


nT l= X19, Vs loo = %ao a seey Ly ine =n 
or, briefly, 


Let us expand the unknown column z into a Maclaurin series in powers 
of t: 
dz 


oe . ¢ F a 2 __ da __ aa 
T= Xt Xo - O21 ; (to= Fi t=O ov dt t=0 


4) (41) 
Then by successive differentiations we find from (39): 


oi 


=AG Ate, SPAT = Ase, 2... (42) 


ie 
Substituting the value ¢=0 in (39) and (42), we obtain: 
tg=Axy, t= Ax, 
Now the series (41) can be written as follows: 
= ty +tAxy+ > "42 Later =e ay. (43) 
By direct substitution in (39) we see”! that (43) is a solution of the 
differential equation (39). Setting ¢=0 in (43), we find: 
x so Fe ° 


Thus, the formula (43) gives us the solution of the given system of differ- 
ential equations satisfying the initial conditions (40). 
Let us set f(A) =e" in ap: Then 


= qu 0 R= S (Zu + Zygt +++ + Zim, tt) el. (44) 


a 


d At? Ast} 
a G (ella F (BH ALE Gp tS Ad ANS Seb he 


118 V. Functions or MAtrRIcEs 


The solution (43) may then be written in the following form: 


Ly = yy (t) Fp + M9 (4) Zep +° °° + Pin (t) Tao 
La = Guy (t) Lyq + Yoo (4) Zag + °° * + Gon (t) Tao (45) 


T= Int (t) X19 + Ino (é) Zan +° . + Inn (t) Tag 


where X10; 29): + +> %nq are constants equal to the initial values of the unknown 
functions 21, Z2,..-, Ln. 
_ Thus, the integration of the given system of differential equations reduces 
to the computation of the elements of the matrix e7*. 
if t=, is taken as the initial value of the argument, then (43) is to be 
replaced by the formula 


a = et Ot) (46) 
Example. 
oF es 3 — %q + %, 
Oo Bn — Zs; 
ot8 — x, — 24+ 22, 


The coefficient matrix is 


3 —1 lil 
A=|/2 01 
1 —1 2 


We form the characteristic determinant 


3—A 1 ] 
2 —Aa ] 
1 —l 2—A 


A(4)=— == (A — 1) (4—2)*. 


The greatest common divisor of the minors of order 2 is De(A) =1. Therefore 
y (4) =A (a) = (A — 1) (A— 2212. 
The fundamental formula is 


f(A) =f (1) Z, + (2) 4, + f (2) Zs- 


For f(A) we choose in succession 1, 4A— 2, (A—2)*. We obtain: 


§ 5. APPLICATIONS TO SYSTEM OF LINEAR DIFFERENTIAL Equations 119 


|1 O O|| 

Z,+2,—H=|}0 1 O]f, 
00 1 
1 —1 
—2Z,+2,=A—2HE=||\2 —2 
1 —1 


1 
1 
0 
“000 
0 
0 


t,=(4—257 =|—1 1 
| nes ca 


| 


Hence we determine Z,, Z,, Z; and substitute in the fundamental formula 


000 i 00 1 —1 1|| 
HAV=FC)|]—1 1 Ol + f2yii1 OO + f7(2) [1 —1 1] 
—1 1 0 1 —1 1 0 00 


If we now replace f(A) by e**, we obtain: 


000 1 00 1—11 (i+tye* = —te™* te” 
etf—e'll_1 1 Ol te"lj1 0 Ol] +te%) 1 —1 1]/=||—e' + (1+ 8)e* e'—te™ te™ 
—110 1—11 40 OOl| |i—e'+e% ef it 
Thus 
a, = C, (1 +t) e*! — Cyte™* + Cte”, 
x, =C,[—e' + (1 + t) e™] + Cy (e’ — te) + Cyte™, 
wy =C,(—e' + e) + C,(e6—e) + Lye™, 
where 


Cr=2Zyq, Cy = Xyq, Cy Ty. 


2. We now consider a system of inhomogeneous linear differential equations 


with constant coefficients: 


aie __ , 
dt 71171 + Gye%q + °° + 24,0, + f, (t) 


da. 
= Gq1X1 + Agg%o + °°: + By, 2%, + fa (t) 


oeeeeeees8t eee fF © © @ @ & e e oe @¢ @ 


dz 
Fe nr. + Ginhe + ++ + gna t fall), 


(47) 


120 V. Functions or MATRIcES 


where f;(t) (¢=1,2,...,7) are continuous functions in the interval t, = 
=t{<t,. Denoting by f(t) the column matrix with the elements f, (t), fo(t), 

., fa(#) and again setting A= || a4 ||t, we write the system (47) as 
follows: 


o = Ax+ f(t). (48) 


We replace x by a new column z of unknown functions, connected with 
x by the relation 


x= eA (49) 


Differentiating (49) term by term and substituting the expression for 


<* ini (48) we find: 


o4t SF F(t). (50) 
Hence”® 
z(t}=e+ i e~4tf (t) dt (51) 
and so by (49) 
2 = e4t |: + fe4tf (zt) ar|= eA'e + [eA (- F(t) dt ; (52) 


where ¢ is a column with arbitrarv constant elements. 
When we give to the argument ¢ in (52) the value ft, we find c =e-4%2,; 
so that (52) can be written in the following form: 


t on 
x = eAlt—b) %q + { e4 (-*) F(t) dt. (53) 
fg 
22 See footnote 21. 


23 If a matrix function of a scalar argument is given, B(t) = || 5,,(t) || (¢=1,2,..., 


| t 
m; k—=1,2,....2: ti St St), then the integral [B (t) dv is defined in the natura} 


way: ty 


. . a2 
[B()dr= [bx(t) de (§=1,2,...; m; k=1, 2,..., n), 
ty ty 


§ 5. APPLICATIONS TO SYsTEM OF LINEAR DIFFERENTIAL EQuaTions 121 


Setting e4*= || qu(¢) ||?, we can write the solution (53) in expanded 
form : 


Wy = G13 (6 — bp) Zyq + °° + Gin (6 — lg) Sng + 
d 
. + [ltr t—2) h(t) +09 +a, E—1) f, (a) dt 
b 
oo ce OR B) VL: A ed La (54) 
Ba Tn (t — ty) T10 a aa te nn (t — ty) ZnQ + 
t 


+ { [dua (¢—2) fy (2) + 22° + Gan (E— 1) fy (2) de. 


3. As an example we consider the motion of a heavy material point in a 

vacuum near the surface of the ¢arth, taking the motion of the earth into: 
account. It is known* that in this case the acceleration of the point relative 

to the earth is determined by the constant force of gravity mg and the inertia] 

Coriolis force — 2mm X v (vis the velocity of the point relative to the earth, 

w the constant angular velocity of the earth). Therefore the differential 

equation of motion of the point has the form” 


oY = g— 20x v. (55) 


We define a linear operator A in three-dimensional euclidean space by 


the equation 
Ax —=—2w X # (56) 


and write instead of (55) 


Ga dv tg. (57) 


Comparing (57) with (48), we easily find from (53): 


t 
v = Ay, + f eA" dt-g (v% = vln0 ). 
0 


Integrating term by term, we determine the radius vector of the motion 
of the point: 
$ tt 
r==ryt[ A* dev, + [{ A’ dodrg, (58) 
0 06: 


where 
i= Flies and v= Vlpeo° 


24 See A. Sommerfeld, Lectures on Theoretical Physics, Vol. 1 (Mechanics), § 30. 
25 Here the symbol X denotes the vector product. 


122 V. Functions or Matrices 


Substituting for e4' the series 
Ba. 4 a 
and replacing A by its expression from (56), we have: 
= i 2 | 2 1 
r=rg+ vot + > Bt —o@ xX ( v6 + 586) +@x | x (F vot + get) + ees 


Considering that the angular velocity w is small (for the earth, 
w= 7.3 X 10-° sec—!), we neglect the terms containing the second and 
higher powers of w; for the additional displacement of the point due to,the 
motion of the earth we then obtain the approximate formula 


=—@ Xx (ot + 5 at’). 


Returning to the exact solution (58). let us compute e4'. As a prelimi- 
nary we establish that the minimal polynomial of the operator A has the 
form 

y (A) = 4 (A* + 4%). 

For we find from (56) 

A®x = 4@ X (@ X x)= 4(wx) ow — 40*x, 
APx = — 2m X A*x = 8m? (m@ X x). 


Hence and from (56) it follows that the operators E, A, A? are linearly 
independent and that 
A® + 40°A=— O. 
The minimal polynomial y(A) has the simple roots 0, 2w1,— 2a@1. The 
Lagrange interpolation formula for e4* has the form 


sin 2at 1 — cos 2at 


1+ Qu haa Az. 


Then 
sin 2% 1 — cos 2wt 


2 
ae Ate. 


A E+ 


Substituting this expression for e4* in (58) and replacing the operator A 
by its expression from (56), we find 


t* 1 — cos 2wt 2wt — sin 2wt 
r=7_ + vot + B>—@ X (Se ot Se) 3 


— g1 oe 233 
hecaes! E % (= See oy + aioe cos 20 g)]. (59) 


§ 5. APPLICATIONS To SysTeM or LINEAR DIFFERENTIAL Equations 123 


Let us consider the special case v»» =o. When we expand the triple 
vector product we obtain: 


2nt—sin Qot cos 2e0t—14+ 20% 
r=nyt+e-5 +— as (6 Xo) + SSO = O8le 


whére @ is the geographical latitude of the point whose motion we are con- 
sidering. The term 

2mt — sin 2wt 

——zr «(eX @) 
represents the eastward displacement perpendicular to the plane of the 
meridian, and the last term on the right-hand side of the last formula gives 
the displacement in the meridian plane perpendicular to, and away, from, the 
earth’s axis. 


4. Suppose now that the following system of linear differential equations 
of the second order is given: 


d3z a 
at + yy 2y + Ayyt%y + +++ + 04,2, =O 
d*xz —=() 
gar + M0181 HF pe%e toe tT enh = (60) 
dad 
<i + Qy17%.+ F,9%, +o°r+ @,,%,= 0, 
where the ay, (4,k=1,2,...,7) are constant coefficients. Introducing 
again the column r= (2), %2,..., Zn) and the square matrix A= || ux [2 » 
we rewrite (60) in matrix form 
7 F; 
“a + Ae=0. (60’) 


We consider, to begin with, the case in which | A| 0. If n=1, ie,, if 
2 and A are scalars and A ~ 0, the general solution of the equation (60) can 
-_be written in the form 


a= cos (VA t) Lo.+ (VAy’ sin (yA t) Lo (61) 


. dx 
where %)=— z Lao 20d 2 = a 


t= 

By direct verification we see that (61) is‘a solution of (60) for arbitrary 
m, where x is a column and A a non-singular square matrix.”* Here we use 
the formulas 


a 
26 By yA we mean a matrix whose square is equal to 4. yA , we know, exists when 


| 4 | 40 (see p. 114). 


124 V. FuNcrTIons or MATRICES: 


cos (A t) =H —> AP + AM oe 
(62) 


(VA) sin (y.41) = Ei — =. Ab® +- - Ars —... , 


Formula (61) comprises all solutions of the system (60) or (60’), as the 
initial values 2 and %, may be chosen arbitrarily. 

The right-hand sides of the formulas (62) have a meaning even when 
|A|=0. Therefore (61) is the general solution of the given system of 
differential equations also when | .1 |=0, provided only that the functions 
cos (V At) and (VA)! sin (At), which are part of this expression. are inter- 
preted as the right-hand sides of the formulas (62). 

We leave it to the reader to verify that the general soluticn of the im- 
homogeneous system 


S24 Az=f(t) (63) 


e e e e e e e dx e 
satisfying the initial conditions 2 |,_,=2%,) and A in %, can be written 
f= 


in the form 
z= cos (A t) x + (VA)? sin (VA 2) Zo + 
+ (Vay? f sin [yA (t—t)] f (t) dr. (64) 
F | 


If ¢ =f, is taken as the initial time, then in (61) and (64) eos (V Af) and 
: e 6 
sin (VAt) must be replaced by cos (VA Gs ty)) and sin (JA (t — to)), and 7 by f 
6 4 
In the special case 


f(t) =h sin (pt + a): 


(h is a constant column, and p and o are numbers), (64) ean be replaced by: 
2 =cos (VA t)e + (fA) ‘sin (JA thd +(A — gE)" hsin (pt + @), 


where c and d are columns with arbitrary constant elements. This formula 
has meaning when p* is not a characteristic value of the matrix A 
(| 4A — p?E | 0). 


§ 6. Stasmity or MorTion IN THE CASE oF LINEAR SYSTEM 125 


§ 6. Stability of Motion in the Case of a Linear System 


1. Let 21, r2,..., 2, be parameters that characterize the displacement of 
‘perturbed’ motion of a given mechanical system from an original motion,?’ 
and suppose that these parameters satisfy a system of differential equations 
of the first order: 

dz. : 

felts Taser Mert (G1, 2,..., 0); (65) 
the independent variable ¢ in these equations is the time, and the right-hand 
sides f;(z1. Zo,..., Zn, t) are continuous functions of the variables x1, ..., Zn 
in some domain containing the point z; = 0, z2=0,...,2,=0) for allt 2 ty 
(t) is the initial time) - 


We now introduce the definition of stability of motion according to 
Lyapunov.”8 

The motion to be investigated is called stable if for every e > 0 we can 
find a 6>0 such that for arbitrary initial values of the parameters 
Lig ®o9,---, no (for t=.) with moduli less than 6 the parameters 2, 22; 
...,%,_, remain of moduli less than e for the whole time of the motion (¢ = ft), 
i.e., if for every ¢ > O we can find a 6 > 0 such that from 


| 2 |< (t =1, 2, ..., ”) (66) 
it follows that 
|a,(t)|<e ° (f2>t). (67) 


If, in addition, for some 6 >0 we always have lim ri(t)=0(1=1, 2,...,n) 
l—> +00 


as long as | %o| <6 (¢= 1,2,...,%), then the motion is called asymp tot- 
ically stable. 


We now consider a linear system, i.e., that special case when (65) is a 
system of linear homogeneous differential equations 


dz = 
a Ps Pa, (t) Xp, (68) 
: yom 1 4 
where the p;,(¢) are continnous functions for ¢ = ft, (1, k =1, 2,...,n). 


In matrix form the system (68) can be written as follows: 


27In these parameters, the motion to be studied is characterized by constant zero 
values 2: = 0, 12 == 0,..., %n==0. Therefore in the mathematical treatment of the prob- 
lem we speak of the ‘stahility’ of the zere solution of the system (65) of differential 
equations. 


28 See [14], p. 13; [9], pp. 10-11; or [36], pp. 11-12. See also {3}. 


126 V. Functions or MATRICES 


& =P (t) x, (68’) 


where x is the column matrix with the elements 2, %2,...,Z, and P(t) = 
|| pue(t) ||t is the coefficient matrix. 


We denote by 
Taj (t), Gog (t), 2+» Gay (f) «= (G1, 2, ---, 0) (69) 
m linearly independent solutions of (68).7? The matrix Q(t) =|| qy IK 


whose columns are these solutions is called an integral matriz of the sys- 
tem (68). 

Every solution of the system of linear homogeneous differential equa- 
tions is obtained as a linear combination of n linearly independent solutions 
with constant coefficients : 


= Soy) (¢=1,2,...,n), 


or in matrix form, 


z=Qié)e, (70) 
where c is the column matrix whose elements are arbitrary constants ¢,, C2 
saiOwe 
We now choose the special integral matrix for which 


Q (fo) = #; (71) 


in other words, in the choice of 7 linearly independent solutions of (69° 
we shall start from the following special initial conditions :*° 


0 (#A)), 
1 (¢=7) 
Then setting ¢=t, in (70), we find from (71): 


Vy (fo) = =| (3,97=1,2,..., m). 


Zo C2 


and therefore formula (70) assumes the form 


r=Q (t) Xp (72) 
or, in expanded form, 


x; = es Os (é) oo) (= 1, 2, eeey n) ° (72’) 


29 Here the second subscript j denotes the number of the solution. 
30 Arbitrary initial conditions determine uniquely a certain solution of a given system. 


§ 6. Srasmity or MoTIon IN THE Case Or LINEAR SYSTEM 127 


We consider three cases: 
1. Q(t) is a bounded matriz in the interval (f), + 00), 1.e., there exists a 
number M such that 
laj()| SM (Sty; ¢, j= 1,2, ..., 0). 


? 


In this case it follows from (72’) that 


| x, (t)| <nM max | zp |. 


The condition of stability is satisfied. (It is sufficient to take d< a 
in (66) and (67).) The motion characterized by the zero solution x, =0, 
Lo—0,..., 2, = 018 stable. 

2. lim Q(t) =O. In this case the matrix Q(t) 1s bounded in the interval 

t—++00 


(t,,-+ 00) and therefore, as we have already explained, the motion is stable. 
Moreover, it follows from (72) that 


lim z(#)= 0. 
t—» +00 
for every %. The motion is asymptotically stable. 

3. Q(t) is an unbounded matrix in the interval (f, +00). This means 
that at least one of the functions qi;(¢), say qxx(#), 1s not bounded in the 
interval. We take the initial conditions z5=—90, 2%»=90,..., X_19 =9, 
Typo FO, L414 9=0,.--, 2,9 =O. Then 


ap () = Inu (t) Xo - 
However small in modulus 2%. may be, the function z,(t) is unbounded. The 
condition (67) is not satisfied for any 6. The motion is unstable. 


2. We now consider the special case where the coefficients in the system 
(68) are constants: 
P (t) = P= const. (73) 
We have then (see § 5) 
we =e t-o) x. (74) 


Comparing (74) with (72), we find that in this case 


Q (t) =e? C-%), (75) 
We denote by 


y (A) = (A mare A,)™ (A —_ Ao)? eee (A — A,)™ 


the minimal polynomial of the coefficient matrix P. 


128 ‘VV. Functions or MATRICES 


For the investigation of the integral matrix (75) we app!y formula (17) 
on Pp. 104. In this case I(A)== ge’ (t-te) (t is regarded as 2 parameter) 
{M (Ay) = (t — be) efk(-t) | Formula (17) yields 


g 

X 

eP Oto) — STZ, + Zig (O— tg) toe + Zany (bg ET eR), (78) 
kel 


We consider three cases: 


1. Red, SO (K=1,2,....5); and moreover, for all A, with Red, =C 
the corresponding m,=1 (2.e., pure imaginary characteristic values are 
simple roots of the minima! polynomial). 


2. Rea, <0 (kK=1,2,...,8). 
8. For some k we have Red, > 0; or Re A, =O. but m;, > 1 


Yrom the formula (76) 11 follows that in the first case the matrix Q(t) = 
ePt-) is bounded in the interval (4, 4- 00), in the second cass lim e? &) = 0, 
. t 


+00 
and in the third ease the matrix e?&) is not bounded in the interval 
(toy -+ 00). 
Therefore in the firsi case the motion (17,=0. z.=0, .... 7, =0) is 
stable. in the second case it is asymptotically stable, and in the third case 
it is unstable. 


31 Special consideration is only required in the case when in (76) for e?(@) there 
oceur several terms of maximal growth (for t —> + 0), i.e., with maximal Re \y =a. and 
(for the given Re \x =a.) maximal value m,=m,. The expression (76) can be repre- 
sented in the form 


PU Mme eMail IL SL eH + (a), 
j=1 


where f,, B.,..., 8, are distinct real numbers and (%*) denotes a matrix that tends to 
zero as t—> ‘+oo. From this representation it follows that the matrix e? (+) is not 


r 
bounded for a) +). —1>0,hecause the matrix > Liye (10) cannot converge for 
j=l 
t-—>-+00. We can see this by showing that 


; 
fit)= See", 
j=l 


where c; are complex numbers and f; real and distinct numbers, can converge to zero for 
t—> +o only when f(t) = 0. But, in fact, it follows from lim f(t) = 0 that 


li —>--oo 
r T 
514 = lim 7 fis pa =o 
and therefore 
C,; = Cg i 0. 


§ 6. STABILITY OF MoTION IN THE CASE OF LINEAR SYSTEM 129 


The results of the investigation may be formulated in the form of the 
following theorem :*” 


THEOREM 3: The zero solution of the linear system (68) for P =const. 
is stable in the sense of Lyapunov tf 


1) the real parts of all the characteristic values of P are negative or zero, 

2) those characteristic values whose real part is z¢ro, 1.e., the pure wmagi- 
nary characteristic values (1f any such erst); are dase roots of the minimal 
polynomial of P; re 


and it ts unstable if at least one of the conditions 1), 2) is violated. 


The zero solution of the linear system (68) is asymptotically stable if and 
only if all the characteristic valucs of P have negative real parts. 


The considecations above enable us to make a statement about the nature 
of the integral matrix e?() in the general case of arbitrary characteristic 
values of the constant matrix P. 


TrkOREM 4: The integral matrix e? ¢-') of the linear system (68) for 
P= const. is always representable in the form 


ePt-t) = Z(t) + Zy + Z, (1) 
where 
1) lim Z_(t) =0, 
t-» +00 
2) Zs cither constant or is a boundcd matrix in the interval (to, + ) 
that does not have a limit for t>+0, 
3) Z4(t) =O or Z(t) ts an unbounded matrix in the interval (ty, + 00). 


Proof. On the oe eae side of (76) we divide all the summands into 
three groups. We denote by Z (t) the sum of all the terms containing the 
factors e**() with Re a, <0. Wedenote by 4, the sum of all those matrices 
Zx1 for which Rea, =90. We denote by 74 (¢) the sum of all the remaining 
terms. It is easy to see that Z_(t), Z(t), and Z4(t¢) have the properties 
1), 2), 3) of the theorem. 


32 On the question of sharpening the criteria of stability and instability for quasi- 
linear systems (i.e., of non-linear systems that become linear after neglecting the non- 
linear terms), see further Chapter XIV, § 3. 


CHAPTER VI 


EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES. 
ANALYTIC THEORY OF ELEMENTARY DIVISORS 


The first three sections of this chapter deal with the theory of equivalent 
polynomial matrices. On the basis of this, we shall develop, in the next 
three sections, the analytical theorv of elementary divisors, i.c., the theory 
of the reduction of a constant (non-polynomial) square matrix A to a normal 
form A (A = TAT"), In the last two sections of the chapter two methods 


for the construction of the transforming matrix T will be given. 


§ 1. Elementary Transformations of a Polynomial Matrix 
1. Dertnition 1: A polynomial matrix, or j-matriz, is a rectangular 
matrix A(A) whose elements are polynomials in i: 
A (A) = || % (4) || = aah t aarti + ---ta®|| (G6 =1,2,..., m; 
WN Bic ieg RA) 


here lis the largest of the degrees of the polynomials ay,(A). 
Setting 


A,= jje\| (@=1,2,...,m; k=1,2,...,0; 7=0,1,...,0, 


we may represent the polynomial matrix A(A) in the form of a matrix 
polynomial in J, 1.e., in the form of a polynomial in 4 with matrix coefficients : 


A(A)=Aj,A' + A,A-1 4-06 4+ A, A+ A;. 


We introduce the following elementary operations on a polynomial mat- 
rix A(A): 3 
1. Multiplication of any row, for example the 2-th, by a number c + 0. 


130 


$1. ELEMENTARY TRANSFORMATIONS OF A POLYNOMIAL Marrix 131 


-2. Addition to any row, for example the 2-th. of any other row, for 
example the 7-th, multiplied by any arbitrary polynomial b(A). 
3. Interchange of any two rows, for example the 7-th and the j-th. 


We leave it to the reader to verify that the operations I., 2., 3. are equi- 
vajent to a multiplication of the polynomial matrix A(/) on the left by the 
following square matrices of order m, respectively :? 


. (3) (s) (4) 
Daca ecg ads cae 0 | res ene aay) 
Lx BANE 
= 6 Ss’ = = 
a re 1 a wl 
(t) () (1) 
1 0 
20.002 
gs” - 
1... .0. 
; eh 


in other words, as the result of applying the operations 1., 2., 3. the matrix 
A(A) is transformed into S’-A(A), S’-A(A) , and S’”’-A(A), respectively. 
The operations of type 1., 2., 3. are therefore called left elementary opera- 
tions. 

In the same way we define the right elementary operations on a poly- 
nomial matrix (these are performed not on the rows, but on the columns) ;* 
the matrices (of order n) corresponding to them are: 


1In the matrices (1) all the elements that are not shown are 1 on the main diagonal 
and 0 elsewhere. 


2 See footnote 1. 


132 VI. EQuIvALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


I ree "| Dee ee 0 
. | | ‘ “W@) 
=| c. d@, ° P= oe 
| | BA). ee WG 
[ve 
en Bee ha nit 
I} . of 
fo i 
0. i. aw 
p= ar | 
( 


The result of applying a right elementary operation is equivalent to 
multiplying the matrix .4(1) on the right by the corresponding matrix 7. 

Note that 7’ and T’” coineide with 8’ and S’” and that T” coincides with 
&’’ when the indices i and j are interchanged in these matrices. The matrices 
of type 8’, 8S”, 8’ (or, what is the same, T’, T’, T’’) will be called elementary 
matrices. 

The determinant of every elementary matrix does not depend on 4 and 
is different from zero. ‘Therefore each left (right) elementary operation 
has an inverse operation which is also a left (right) elementary operation.® 


DEFINITION 2: TF'wo polynomial matrices A(A) and BA) are called 
1) left-equivalent, 2) right-equivalent, 3) equivalent if one of them can be 
obtained from the other by means of 1) left-elementary, 2) right elementary, 
3) left and right elementary operations, respectwely.* 


3 It follows from this that if a matrix B(A) is obtained from A(X) by means of left 
(right; left and right) elementary operations, then A(A) can, conversely, be obtained 
from B(A) by means of clementary operations of the same type. The left elementary 
operations form a group, as do the right clementary operations. 

4 From the definition it follows that only matrices of the same dimensions can be left- 
equivalent, right-equivalent, or simply equivalent. 


§ 1. ELEMENTARY TRANSFORMATIONS OF A POLYNOMIAL Matrix 133 


Let B(A) be obtained from A(A) by means of the left elementary opera. 
tions corresponding to §;, S2,...,S,. Then 


B (A) =8,S8,_, +++ 8,4 (A). (2) 
Denoting the product S,S,_,°:+&, ty P(A), we write (2) in the form 
B(A)= P(A) A(A), . (3) 


where P(A), like eack: of the matrices 8,, So,..., S; has a constant® non-zero 
determinant. : oo 

In the next secticn we shall prove that every square j-matrix P(A) with 
a constani no.-zerc determinant can be represented in the form of a product 
of elementary matrices. Therefore (3) is equivalent to (2) and signifies 
left equivalence of the matrices A(A) and B(A). 

In the case of right equivalence of the polynomia! matrices A(/) and 
B(A) we shali have instead of (3) the equation 


B(A)= A(4)Q(A) (3”) 
and in the case of (two-sided) equivalence the equation 
B(A)= P(A) A(A) QA). (3”) 


Here again, P(4} and Q(A) are matrices with non-zerc determinants, indc- 
pendent of A. 
Thus, Definition 2 can be replaced by an equivalent definition. 


DEFINITION 2°: Two rectangular i-matrices A(A) and B(A) are called 
1) left-equivaient, 2) right-equivalent, 3) equivalent if 

1) B(A) =P(4)A(A), 

2) B(A) = A(A)Q(A), 

3) BCA) =P(A)A(A)Q(A), 
respectively, where P(A) and Q(A) are polynomial square matrices with con- 
stant non-zero determinants. 


Z. All the concepts introduced above are illustrated in the following im- 
portant example. 

We consider a system of m linear homogeneous differential equations of 
order J with constant coefficients, where x1, Xo, ..., Zn are n unknown func- 
tions of the independent variable ¢: 

yy (D} & + Qy(D) % ++ ++ +4,,(D) 2, =9 
Goi (D) % +.Ag9 (D) gress + a, (D) x,=0 


Ogy (D) 14 + Ogg (D) Hg + +++ + Amy (D) X= 935 


5 T.e., independent of X. 


(4) 


134 VI. EQuIvALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


here 
a,(D)= aD! + aPD +++-+aQ0 (i=1,2, ..., m; k=1,2, ..., 2) 


is a polynomial in D with constant coefficients; D= @ is the differential 


operator. 
The matrix of operator coefficients 


A (D)= || a,.(D) || (¢=1,2,...,m; k=1,2,..., ») 


is a polynomial matrix, or D-matrix. 

Clearly, the left elementary operation 1. on the matrix A(D) signifies 
term-by-term multiplication of the 1-th differential equation of the system 
by the number c~0. The left elementary operation 2. signifies the term- 
by-term addition to the 7-th equation of the j-th equation which has pre- 
viously been subjected to the differential operator b(D). The left ele- 
mentary operation 3. signifies an interchange of the 7-th and j-th equation. 

Thus, if we replace in (4) the matrix A(D) of operator coefficients by a 
left-equivalent matrix B(D), we obtain a deduced system of equations. 
Since, conversely, by the same reasoning, the original system is a conse- 
quence of the new system, the two systems of equations are equivalent.® 

It is not difficult in this example to interpret the right elementary opera- 
tions as well. The first of them signifies the introduction of a new unknown 


‘ ro 1 . ben tae 
function Ei Bri for the unknown function z;; the second signifies the 


introduction of a new unknown function z;—=2z; + 6(D)z, (instead of z;); 
the third signifies the interchange of the terms in the equations that contain 
a, and x; (1.e., %=2;, Z;= 7%). 


§ 2. Canonical Form of a /j-Matrix 


1. To begin with, we shall examine what comparatively simple form we 
ean obtain for a rectangular polynomial matrix A(A) by means of left 
elementary operations only. 

Let us assume that the first column of A(A) contains elements not iden- 
tically equal to zero. Among them we choose a polynomial of least degree 
and by a permutation of the rows we make it into the element a,,;(/). Then 
we divide a4,(4) by ai1;(A) ; we denote quotient and remainder by gi (A) and 
fn(A) (t= 2,...,m): 


8 Here it is assur.ied that the unknown functions 2%, %2,..., %n are such that their deriva- 
tives of all orders, as far as they occur in the transformations, exist. With this restriction, 
two systems of equations with left-equivalent matrices 4(D) and B(D) have the same 
solutions. 


§ 2. CanonicaL Form or a A-MATRIX 135 


A, (A) = ay, (A) 9,3 (A) + 151 (A) (¢=2,...,m). 


Now we subtract from the i-th row the first row multiplied by qa(A) 
(t=2,...,m). If not all the remainders r;,(4) are identically equal to 
zero, then we choose one of them that is not equal to zero and is of least 
degree and put it into the place of a;,(4) by a permutation of the rows. 
As the result of all these overations, the degree of the polynomial a1; (A) 
is reduced. 

Now we repeat this process. Since the degree of the polynomial a,;(4) 
is finite, this must come ta an end at some stage—i.e., at this stage all the 
elements @2,(A), @3i(A),..-, @mi (A) turn out to be identically equal to zero. 

Next we take the element a2.(A) and apply the same procedure to the rows 
numbered 2, 3,..., m, achieving @3.(A) =...=@m2(A) =90. Continuing still 
further, we finally reduce the matrix A(A) to the following form: 


by, (A) bya (A) ..~ 84, (A) 


Bi (A) By (A)... Dag (A) 22» By (A) O bag (A)... Bgq (A) 
O By (A)... + Dam (A) «++ Bon (A)|] Yo Ft ne 
emeneanmensanements a ik ee ner) 
5) 0 0 
0 0 oye Bin (A) v2 Big (A) ee : 
(m <n) P . . ° 70 . . 
(m =n) 


If the polynomial by2(A) is not identically equal to zero, then by applying 
a left elementary operation of the second type we can make the degree of the 
element b,2(4) less than the degree of bee(A) (if bee(A) is of degree zero, 
then bi2(A) becomes identically equal to zero). In the same way, if 
ba,(A) = 0, then by left elementary operations of the second type we make 
the degrees of the elements b,3(4), b23(A) less than the degree of b33(A) with- 
out changing the elements bi2(A), ete. 


We have established the following theorem: 


THEOREM 1: An arbitrary rectangular polynomial matriz of dimension 
m Xn can always be brought into the form (5) by means of left elementary 
operations, where the polynomials b,,(4), be, (A), ..-, O¢_1,% (A) are of degree 
less than that of bur(A), provided byz (A), and are all identically equal 
tp zero tf by,(A) = const. ~0 (k= 2, 3,...,min (m,)). 

Similarly, we prove 


THEOREM 2: An arbitrary rectangular polynomial matrix of dimension 
m X n can always be brought into the form 


136 VI. E@uIvaLent TRANSFORMATIONS OF POLYNOMIAL MATRICES 


I €, (A) 0 0 | 
C43 (A) 0 0 0 0 Coy (Aj Cae (A) 0 | 
Co, (A} Cog (A) 0 0...0 Oe ee oe te ar aia 
Eeiistts te- deen ee Nous Sc aa aero : Car (Ad Cao (4) +++ Cna(A) || (8) 
Cmi (A) Ome (A} peeestaAy (Oc2.0 ee ee ere ees | 
(m <n) Cm (A) Ome (i). + Crag (A) || 
(% = nt} 


by means oj right elemeniary operaiions, where the polynomials cy.()), 
Cx2(A), wae Cy p—ylA} are of aegree less then thai of cix(A), provided 
Cry(A) 0, and all are identically equai to zere af Cyt Ad) = consi. ZAC 
(k=2,3,..., min (m,n)). 

2. From Theorems } and 2 we deduce the corollary : 

CorouLsry: If the determinant of a squere peiynomial matriz P()) 
does not aepend on / and is aifferent from zere. then the matrix car be 
representea in the form of « product of a finitc number of elementary 
matrices. 

For by Theorem 1 the matrix P(A) can be brought inte the form 


bi (2o Dig (A) 2 By A) | 
C baa (A). Bog (A) 


a e@ e© » + @ @ 8 € = = & &@ © @ 8 @ 


(7) 


by left elementary operations, where n is the order of P(A). Since in the 
application of elementary operations to a square polynomial matrix the 
determinant of the matrix is only multiplied by constant non-zero factors, 
the determinant of the matrix (7), lke that of P(A}. does not depend on 
and is different from 0, i.e., 


bi (A) bag (A) +++ Ong (A) = const. ¥ 0. 
Hence 
b,,(A)= const. * 0 (K=1, 2,..., 0). 


But then, also by Theorem 1, the matrix (7) has the diagonal form || 6,6, ||" 
and can therefore be reduced tc the unit matrix E by means of left ele- 
mentary operations of type 1. But then, conversely, the unit matrix # can 
be transformed into P(A) by means of the left elementary operations whose 
matrices are S;, S2,...,S,. Therefore 


P(A) =S8,8,_1°+* S,H=S,8,_, +++ 8. 


§ 2. CaNoNnrIcAL Form or A A-MatTRIX | 137 


As we pointed out on p. 133, from this corollary there follows the equiva- 
lence of the two Definitions 2 and 2’ of equivalence of polynomial matrices. 


3. Let us return to our example of the system of differential equations (4). 
We apply Theorem I to the matrix | ain( D) | of operator coefficients. As 
we have shown on }). 135, the system (4) is then replaced by an equivalent 
system 


943 (D) 2, + By, (D) te + +++ + bys (D) %y = — by, 54:1 (D) Hyp — +++ — bin (D) Sn, | 
by. (D) cs ean bog (D) <,—— De, 544 (D) Vet-1—— "°° — ban (D) Zn» (4’) 
bss (D) Xp =— bg, 54:1 (D) %e41—*** — Oan(D) Fn 
where s=min (m,n). In this system we may choose the functions 2,,,, 
-.., % arbitrarily, after which the functions 2,, 2% 1,..., 2%, can be deter- 


mined successively ; however, at each stage of this process only one differen- 
tial equation with one unknown fiinction has to be integrated. 


4, We now pass on to establishing the ‘canonical’ form into which a rec- 
tangular matrix 4(/A) can be brought by applying to it both left and right 
elementary operations. 

Amorg all the elements a,,(2) of A(4) that are not identicaliy equal to 
zero we choose one which has the least degree in A and by suitable permuta- 
tions of the rows and columns we make this element into a@,(A4). Then we 
find the quotients and remainders of the polynomials a,;(A) and a,(A) on 
division by a::(A): 


Dey (A) = yy (A) Gay (A) 7 (AP, yy (A) = yy (A) yy (A) + 7 (A) 
2 Soap Wy R25 Ope beg a 

If at least one of the remainders ry, (A), rix(A) (0=2,...,m;k =2,..., 
nm), for example 7;;(A). is not identically equal to zero, then by subtracting 
from the k-th column the first column multiplied by gi,(4), we replace 
@ix(A) by the remainder r;;,(4), which is of smaller degree than a,,(A). 
Then we can again reduce the degree of the element in the top left corner 
of the matrix by putting in its place an element of smaller degree in A. 

But if all the remainders ro: (A), ..., %mi(A) 5 T12(A),..-, 7in(A) are iden- . 
tically equal to zero, then by subtracting from the 7-th-row the first multi- 
plied by g11(A) (¢=2,..., m) and from the &-th column the first multiplied 
by qiz(A) (k =2,...,), we reduce our polynomial matrix to the form 


Q,(4) 0 ... 9 
9 Gon (A)... Boy (A) 


oeoeens @ ‘ee oe ee ee eh Fee 


0 Amo (A) - «+ Gna (A) 


138 VI. EQuivaLENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


If at least one of the elements a,,(4) (t= 2,...,m;k=2,...,n) is not 
divisible without remainder by a;(/), then by adding to the first column 
that column which contains such an element we arrive at the preceding case 
and can therefore again replace the element a,,(4) by a polynomial of smaller 
- degree. 

Since the original element a,,(4) had a definite degree and since the 
process of reducing this degree cannot be continued indefinitely, we must, 
after a finite number of elementary operations, obtain a matrix of the form 


|, 3 (A) 0 ... O | 
0 Boo (A)... Dog (A) | . (3) 

0 bg (A) .- + Onn (A) 
in which all the elements 0,,(A) are divisible without remainder by a,(A). 
If among these elements 0,,(A) there is one not identically equal to zero, then 


continuing the same reduction process on the rows numbered 2,...,m and 
the columns 2,...,n, we reduce the matrix (8) to the form 


a; (A) 0 0 ... O 
0  a,(A) 0 ... O 
0 0 €,(A) .-. Cg, (A) Hi » 


eo e¢#ee@e@ @ ® @® © @® @ @ «© © @ © # © @ @ @ 


0 0 Cmg(A) - ++ Cmy(A) 


where a@2(A) is divisible without remainder by a,(4) and all the polynomials 
cix.(A) are divisible without remainder by ae(A). Continuing the process 
further, we finally arrive at a matrix of the form 


a(A4) O ... O 9Q...90 


0 0 ...a,(A) 0...0]|, (9) 
0 0 0 0 0 
0 0 0 0 0 

where the polynomials a,(4), a2(A), ..., a3(A) (s S min (m, n)) are not 


identically equal to zero and each is divisible by the preceding one. 

By multiplying the first s rows by suitable non-zero numerical factors, 
we can arrange that the highest coefficients of the polynomials a,(A), 
a2(d),...,@,(A) are equal to 1. 


§ 3. INVARIANT POLYNOMIALS AND ELEMENTARY DIVISORS 139 


DEFINITION 3: A rectangular polynomial matrix is called a canonical 
diagonal matrix if it is of the form (9), where 1) the polynomials a,(A), 
do(A), ..., G(A) are not identically equal to zero and 2) each of the poly- 
nomials do(A), ..., de(A) is divisible by the preceding. Moreover, it 1s as- 
sumed that the highest coefficients of all the polynomials a,(A), ao(A),..., 
a,(%) are equal to 1. 


Thus, we have proved that: An arbitrary rectangular polynomial matriz 
A(A) ws equivalent to a canonical diagonal matrix. In the next section we 
shall prove that: The polynomials a,(A), a2(A), ..., @g(A) are uniquely 
determined by the given matrix A(A); and we shall set up formulas that 
connect these polynomials with the elements of A(A). 


§ 3. Invariant Polynomials and Elementary Divisors 
of a Polynomial Matrix 


1. We introduce the concept of invariant polynomials of a A-matrix A(A). 

Let A(A) be a polynomial matrix of rank rf, i.e., the matrix has minors 
of order r not identically equal to zero, but all the minors of order greater 
than r are identically equal tc zero in’. We denote by D,(A) the greatest 
common divisor of all the minors of order j in A(A) (j= 1,2,...,r).7. Then 
it is easy to see that in the series 


D,(A), D,_1(A),--+,D,(A), Dy(A=1 


each polynomial is divisible by the preceding one.* The corresponding quo- 
tients will be denoted by 11 (4), #2(4),..., % (A) : 


D, (A) 
D,_3(A)’ 


D,—1 (4) _ Dy(A) _ 


D,_2(A)’ ceoey MW=p a= (A). (10) 


be (A)= 


1,(A)= 


DEFINITION 4: The polynomials 4,(A), i2(A),..., %-(A) defined by (10) 
are called the invariant polynomials of the rectangular matriz A(A). 

The term ‘invariant polynomial’ is explained by the following argu- 
ments. Let A(/A) and B(A) be two equivalent polynomial matrices. Then 
they are obtained from one another by means of elementary operations. But 
an easy verification shows immediately that the elementary operations 


7 We take the highest coefficient in D;(\) to be 1 (j—=1,2,...,7). 

8 If we apply the Bézout decomposition with respect to the elements of any row to an 
arbitrary minor of order j, then every term in the decomposition is divisible by Dj-1(\) ; 
therefore every minor of order j, and hence D;(X), is divisible by Dj-1(A) (j= 2,3,...,7)- 


140) = VT. EEQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


change neither the rank of A(A) nor the nolynomials D,(A), De(A), ..., 
D,(4). For when we apply to the identity (3’’) the formula that expresses a 
minor of a product of matrices by the minors of the factors (see p 12), we 
obtain for an arbitrary minor of B(A) the expression 


\k, %, k, 
} Ne etecd Ap... ote B 
a »y p(” Io *) a(e 2 P aati Be ne 
1sa,<ajce+-capem \%y Ay... hy B, By.-- Bp ky ky... ty 


133, <fy<---<fpem 
(p=1, 2,..., min(m, n)). 


Hence it follows that all the minors of order r or greater of tne matrix B(A) 
are zero, so that we have for the rank r* of B(A): 


rt <r, 


Moreover, it follows trom the same formula that D5(1), the greatest common 
divisor of all the minors of order p of B(/), is divisible by D,(A) (p= 1, 2, 
...,min (m,n)). But the matrices A(A) and B(A) can exchange roles. 
Therefore r S r* and D,(4) is divisible by D} (2)(p = 1. 2,..., min (m, n)). 
Hence? 

por", 1 (A)= D(A), DZ (A) =P, (A), ..., DP (A) =D, (A). 


Since elementary operations do uot change the polynomials D,(A), De(A), 
,.., 0,(A), they also leave the polynomials 2; (A), 2%2(A),..., %(A) defined by 
(10) unchanged. | = Fe 

Thus, the polynomials 1,(/), 12(A), ..., % (A) remain invariant on transi- 
tion from one matrix to another equivalent one. 

Tf the polynomial matrix has the canonical diagonal form (9), then it is 
easy to see that for this matrix 


D (Aj = a, (A), Dg (A) = a, (A) a (A), ..-; D, (A) = Gy (A) ag (A) «++ a, (A). 


But then, by (10), the diagonal polynomials in (9) a,(A), ae(A),... a(A) 
eoincide with the invariant polynomials 


ty (A) =a, (A), te (A) = r— (A), es ey t (A) — ay (A) ° (11) 


Here #,(4), w2(A),..., i-(A) are at the same time the invariant polynomials 
of the original matrix A(A), because it is equivalent to (9). 

The results obtained can be stated in the form of the following theorem. 
eee reared: 


9 The highest coefficients in D,( 4) aud D5 (4) (p= 1,2,...,7) are 1. 


§ 3. InvaRIANT POLYNOMIALS AND ELEMENTARY DIVISORS 141i 


TrrEOREM 3: The rectangular polynomiai matria A(A) ts always equiva 
lent to a canonical diagonal matriz 


eo ee 8 @ a#®  @ @©  &#& @  @  e@ 


C C ... 4) 0...0 
. 12 
0 0 0 0...0 (12) 
| 0 0 C0. 0...0): 
Moreover, r must here be the rank of A(A) and 4 (A), (A), ..., 4 (A) the 


envariant polynomials oj A(A) defined bz (18). 


Coro~uary 1: 2'we recitanguiar matrices of the same dimension A (A) 
and E(A) are equivaient if and only if they have the same invariant poly- 
nomials. 

The sufficiency of the condition was explained above. The necessity 
follows from tho fact thet two polynomia! matrices having the same invariant 
polynomials are equivalent to one and the same canonical diagonal matrix 
and, therefore, to each ether. Thus: The invariant poiynomials form a 
complete system of invariants of a A-mairiz. 


CorouLary 2: In the sequence of invariant polynomials 


Dy(a) ogy Dr (A) othe DUA = 
4) = 5°, 4, (4)= aarit wag i, (= Bi (Do (A)=1) (13) 


every polynomial from the second onwards divides the preceding one. 

This statement does not follow immediately from (13). It-does follow 
from the fact that the polynomials 14 (4), i.(A), ..., 4-(A) coincide with the 
polynomials a,(4), a,_, (A), ... , a, (a) of the canonical diagonal matrix (9). 

Zz. We now indicate a method of computing the invariant polynomials of a 
quasi-diagonal A-matrix rf the invariant polynomials of the matrices in the 
diagonal blocks are known. 


THEOREM 4: If in a quasi-diagonal rectangular matrix 


every invariant polynomial of A(A) divides every invariant polynomial of 
B(A), then the set of imvariant polynomials of C(A) 1s the union of the 
invariant polynomials of A(A) and B(A). 


A(a) O 


oa) | O B(A) 


142 VI. EquivaLent TRANSFORMATIONS OF POLYNOMIAL MATRICES 


Proof. We denote by 1(A), 12(4),..., 4% (A) amd 7A), tg (A), .. 4 (A), 
respectively, the invariant polynomials of the A-matrices A(A) and B(A). 
Then’ 


A (A) ~ {i, (A), ...5 4 (A), 0, ..., 9}, B (a) ~ {i7 (A), ..., #) (A), 0, ..., 0} 
and therefore 
CO (A) ~ {5 (a), ..., & (A), iy (A), ..., #7 (A), 0, ..., O}. (14) 


The A-matrix on the right-hand side of this relation is of canonical diago- 
nal form. By Theorem 3 the diagonal elements of this matrix that are not 
identically equal to zero then form a complete system of invariants of the 
polynomial matrix ((4). This proves the theorem. 

In order to determine the invariant polynomials of ((A) in the general 
ease of arbitrary invariant polynomials of A(A) and B(A) we make use 
of the important concept of elementary divisors. 

We decompose the invariant polynomials 2,(A), #(A),..., %(A) into irre- 
ducible factors over the given number field F:” 


I (A) = [py (A)J* [9 (A) ]* +++ [p,(A) I, 

ig (A) = 19, (A) J+ [pe (A)]® -- ae (A)}*, (" 2d,2::-2h2 °°) 

Sep Gah WL Bie GL Ao le eae es SE Se k=1,2,...5,8 
£(A=[ Py (ayy [p2(A)I* +++ Lp, (A)]*. 

Here 9,(A), 92(A),---,9,(A) are all the distinct factors irreducible over F 

(and with highest coefficient 1) that occur in 7,(A), #2(4),..., & (A). 


DEFINITION 5: All the powers among [p,(A)]}*,... , [p, (A) ]}* in (15), as 
far as they are distinct from 1, are called the elementary divisors of the 
"matrix A(A) in the freld ¥.*” 


THEOREM 5: The set of elementary divisors of the rectangular quasi- 
diagonal matrix 


(15, 


A (A) om 


oa)=| O B(A) 


is always obtained by combining the elementary divisors of A(A) with those 
of B(A). 


10 The symbol ~ denotes here the equivalence of matrices; and braces { }; a diagonal 
rectangular matrix of the form (12). 

11 Some of the exponents cx, dx, ..., i (k= 1, 2,...,8) may be equal to zero. 

12 The formulas (15) enable us to define not only the elementary divisors of 4(\) 
in the field F in terms of the invariant polynomials but also, conversely, the invariant 
polynomials in terms of the elementary divisors. 


§ 3. INVARIANT POLYNOMIALS AND ELEMENTARY DIVvISoRS 148 


Proof. We decompose the invariant polynomials of A(4) and B(A) into 
irreducible factors over F:?® 


& (A) =[p, (ADT Len (AIP «++ Lp, (AN, 87 (2) = Loy (A) Ee AP Ep, (ayy, 
65 (A) = [9a (A [py (ANI «++ Loy (ADI, 43 (A) = Les (ANT Le (AI +++ Lp, (ADI 


gO So Se! ce ce Ce ot er i OO ee OO OO OR ee Ce 8 ee ee 


£ (A) =[ 9, (ANI [oe (ANT «= Loe (A, 127(€) Lo (A) Toe (A + Up, (A 


We denote by 
q2d,2---2h>0, (16) 


all the non-zero numbers among ¢}, d;,..., Ay, 1,41) ---5 91 - 
Then the matrix C(A) is equivalent to the shatrix (14), and by a permuta- 
tion of rows and of columns the latter can be brought into ‘diagonal’ form 


{[p, (A)y* i (*), [o, (a)y" : (x), ee [%, (ayy? ° (x), (**), aes (««)} (17) 


where we have denoted by (*) polynomials that are prime to (4) and by 
(*%*) polynomials that are either prime to g,(A4) or identically equal to 
zero. From the form of the matrix (17) we deduce immediately the follow- 
ing decomposition of the polynomials D,(4), D,_1(4),... and (A), 2(A),... 
of the matrix C(A) : 


D, (A)= [py (ATA (x), Dy (A= Eps (Ayr P+ (4), oe, 
1 (A) eo [vy (A)y" (*), $y (A)= [p, (A) (x), cece 
Hence it follows that [p,(4)]*, [9,(A)]“, ..-, [1 (A)]*, 1.-e., all the powers 


[pr (Ay, «2. Co, (A, [9,8 Ce (ADI, 


as far as they are distinct from 1, are elementary divisors of C'(A). 

The elementary divisors of C(A) that are powers of w2(A) are determined 
similarly, etc. This completes the proof of the theorem. 

Note. The theory of equivalence for integral matrices (i.e., matrices 
whose elements are integers) can be constructed along similar lines. Here 
in 1., 2. (see pp. 130-31) c==+ 1, b(A) is to be replaced by an integer, and 
in (3), (3’), (3’), in place of P(A) and Q(A) there are integral matrices 
with determinants equal to + 1. 


13 If any irreducible polynomial g,(A) occurs as a factor in some invariant polynomials, 
but not in others, then in the latter we write (4) with a zero exponent. 


144 VI. EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


3, Suppose now that A= || a, ||? is a matrix with elements in the field F. 
We form its characteristic matrix 
| A—@y, — Ayo — Asn 
—Mp, A—Agn ... —@ 
Any — Ing -- + A— Any 


The characteristic matrix is a J-matrix of rank 7. its invariant polynomials 


i (Aya a, (=P, 4 WM=ZiG) (Dela), (19) 
are called the invariant polynomials of the matrix A and the corresponding 
elementary divisors in F are called the elementary divisors of the matrix A 
in the field Fr. A knowledge of the invariant polynomials (and, hence, of 
the elementary divisors) of A enables us to investigate its structure. There- 
fore practical methods of computing the invariant polynomials of a matrix 
are of interest. The formulas (19) give an algorithm for computing these 
polynomials, but for large » this algorithm 1s very eumbrous. 

Theorem 3 gives another method of computing invariant polynomials, 
based on the reduction of the characteristic matrix (18) to canonical diago- 
nal form by means of elementa.y operations. 


Example : 
| 3 1 0 0 4A—3 —1 0 0 
_l— 4 -—I!1 0 0 '4 441 0. 9 
A= ‘ AE—A = 
| 6 i 21 Or vg cet he ol 
—l4 —5 —1 0 14 ‘6 1 A 


In the characteristic matrix ae A we add to the fourth row the third 
multiplied by A: 


| ae a | 0 0 
4 a4 0 0 
= es | a en 


14—64 5—A 18—2441 0 


Now adding to the first three columns the fourth, multiplied by — 6, — 1 
and A— 2, respectively, we ‘obtain . 


A4—3 —l 0 a 
4 A+i1 0 
0 0 0 “ale 


14—64 5—A M2441. 01 
We add to the first column the second multiplied by 1— 3: 


§ 4. EQuIvaLENcE or LINEAR BINOMIALS 145 


(hs “OF: oe gee 0 0 | 
APPA +1 A+1 0 0 
0 0 0 —1}|" 


—I42A-—1 B—A AF—2A41 0! 


To the second and fourth rows we add the first multiplied by 4 + 1 and 5 — 4, 
respectively ; we obtain 


Oo cl 0. 0 
A2—-2A+1 0 o- 1) 
0 0 oO ° —1I 


—A'4+2A—1 0 At9—2241 0 


To the second row we add the fourth; then we multiply the first and third 
rows by —1. After permuting some rows and columns we obtain: 


1 0 0 oO. | 
0 1° O o fl 
0 0 (A—1)? 0 

0 0 oO (&—1) 


The matrix has two elementary divisors (A —1)? and (A —1)?. 


§ 4. Equivalence of Linear Binomials 


1. In the preceding sections we have considered rectangular j-matrices. In 
the present section we consider two square A-matrices A(/j) and Bi) of 
order n in which all the elements are of degree not higher thanlind. These 
polynomial matrices may be represented in the form of matrix binomials: 


A(A)=A.A+4,, B(A)=B A+ B,. 


We shall assume that these binomials are of degree 1 and regular, i.e., 
that | Ao | +0, | By | #0 (see p. 76). 
. The following theorem gives a criterion for the equivalence of such 
birey ‘als: 
' Trrorem 6: If two regular binomials of the first degree AoA + A, and 
BA + B, are equivalent, then they are strictly equivalent, 1.e.,1n the identity 


BA + By == P (a) (Aga + A,) Q (A) (20) 


the matrices P(A) and Q(A)—with constant non-zero determinants—-can be 
replaced by constant non-singular matrices P and @ :** 


Bo + By =P (AA + 4,)Q. (21) 


14The identity (21) is equivalent to the two matrix equations: Bo== P4A.Q and 
= PAi@. 
1 


146 VI. EQurvaLENT TRANSFORMATIONS OF POLYNOMIAL MaTRICES 


Proof. Since the determinant of P(A) does not depend on J and is differ- 
ent from zero,’ the inverse matrix M(j) = P-*(2) is also a polynomial mat- 
rix. With the help of this matrix we write (20) in the form 


Mi (A) (Bod + B,)= (AoA + 41) Q (A). (22) 

Regarding M(A) and Q(2) as matrix polynomials, we divide M(A) on 
the left by AoA + Ai and Q(A) on the right by BoA + B,: 

M (a)= (AgA + 4,)8 (A) + M, (23) 

Q(A)=T (A) (BoA + By) +Q; (24) 

here M and Q are constant square matrices (independent of A) of order n. 

We substitute these expressions for M(A) and Q(A) in (22). After a few 

small transformations, we obtain 
(AgA + A,) (7 (A) —S (A)] (BoA + By) = M (BoA + B,) — (Ay A + Ay) Q. (25) 


The difference in the brackets must be identically equal to zero; for other- 
wise the product on the left-hand side of (25) would be of degree = 2, while 
the polynomial on the right-hand side of the equation is of degree not higher 


than]. Therefore 
8 (a)= T (a); (26) 


But then we obtain from (25) : 
M(B,A + By) =(AoA + AD) Q.- (27) 
We shall now show that M is a non-singular matrix. Tor this purpose 
we divide P(A) on the left by BoA + B,: 
P (A) =(Bgd + B,) U (a) + P. (28) 
From (22), (23), and (28) we deduce: 
E = M (Aa) P(A) = M(A) (BoA + B,) U(A) + MA) P 
= (AgA + A,)Q(A) U(a) + (gd + A,) 8 (A) P+ MP 
= (ApA + Ay) [Q (A) U(A) +8 (A) P] + MP. (29) 
16 The equivalence of the binomials 4,4+ 4, and BoA-+ B; means that an identity 
(20) exists in which | P (4) |= const. s£0 and | Q(4) | const. 40. However, in this 


case the last relations follow from (20) itself. For the determinants of regular binomials 
of the first degree are of degree n: 


| Aod + Ay | =! Ao | A" +..., | Bod + By |= | Bo| A*+...3| Ao | FO, | By | KO. 
Therefore it follows from 
| Bod + By |= | P(a)| | doa + Ay | (Q(A) | 
that 
| P(4) |=const. 0, | Q (a) | = const. + 0. 


§ 5. A CRITERION FoR SIMILARITY OF MATRICES 147 


Since the last term of this chain of equations must be of degree zero in A 
(because it is equal to #), the expression in brackets must be identically 
eoual to zero. But then from (29) 


MP=E, (30) 


so that | M | 540 and M-!=P. 
Multiplying both sides of (27) on the left by P, we obtain: 


BA + By =P (A A+ 4,)Q. 
The fact that P is non-singular follows from (30). That P and Q are non- 
singular also follows directly from (21), since this identity implies 
By = PA,Q 
and therefore 
|P||4o/|@1=| Bol ¥ 0. 

This completes the proof of the theorem. 

Note. From the proof it follows (see (24) and (28)) that the constant 
matrices P and Q by which we have replaced the A-matrices P(A) and Q(2) 


in (20) can be taken as the left and right remainders, respectively, of P(A) 
and Q(A) on division by B,A + By. 


§ 5. A Criterion for Similarity of Matrices 


l. Let A | An, ik be a matrix with numerical elements from the field r. 
Its characteristic matrix AH — A is a A-matrix of rank n and therefore has n 
invariant polynomials (see § 3) 


i, (a), #9 (A), «+ +5 4, (A). 


The following theorem shows that these invariant polynomials deter- 
mine the original matrix A to within similarity transformations. 


THEOREM 7: Two matrices A= || au |i and B= || bu ||? are similar 
(B=T—'AT) tf and only tf they have the same invariant polynomials or, 
what 1s the same, the same elementary divisors in the field F. 

Proof. The condition is necessary. For if the matrices A and B are 
similar, then there exists a non-singular matrix T such that 


B=T" AT. 
Hence 
AE—B=TI1(AE—A)T. 


This equation shows that the characteristic matrices AE — A and AE — B 
are equivalent and therefore have the same invariant polynomials. 


148 VI. EQuivaLENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


The condition is sufficient. Suppose that the characteristic matrices 
4E —A and AE —B have the same invariant polynomials. Then these 
a-matrices are equivalent (see Corollary 1 to Theorem 3) and there exist, in 
consequence, two polynomia! matrices P(A} and Q(A) such that 


AE — B= P (A) (AB —A)Q (A). (31) 

Applying Theorem 6 tc the matrix binornials 4B — A and AE — B, we 

may replace in (31) the A-matrices P(A) and Q(A) by constant matrices 
PandQ: 

AE —B=P(sE—A)Q; ‘(32) 


moreover, P and @ may be taken (see the Note on p. 147) as the left 
remainder and the right remainder. respectively, of P(A) and @(A) on 
division by AE — B, i.e., by the Generalized Bézout Theorem, we may scet:'® 
P=P(B),Q=Q(B) (33) 
Equating coefficients of the powers of 4 on both sides of (32), we obtain : 


B=PAQ, E=PQ, 


Le., 
B=T" AT, 


where 
T= Q= P-! 


This proves the theorem. 


2. Note. We have incidentally established the following result, which we 
state separately : 


SuppLEMENT To TuEoREM 7. If A= || ax ia and B= || bis |i are two 


similar matrices, 
B= TAT : (34) 


then we can choose as the transforming matrix T the matriz 
T =Q(B)=(P(B)}", (35) 
where P(A) and G(A) are polynomial matrices in the identity 
AE — B= P(A) (AE — A) Q (A) 


which connects the equivalent characteristic matrices AE — A and AE — B; 
in (35) Q(B) denotes the right value of the matrix polynomial Q(A), and 


P(B) the left value of P(A), when the argument is replaced by B. 


“N 
16 We recall that P(B) is the left value of the polynomial P(A) and Q(B) the right 
value of Q(A), when A is replaced by B (see p. 81). 


§ 6. Tae Norman Forms.or a Matrix 149 
§ 6. The Normal Forms of a Matrix 


1. Let 
g (A)= A + AMV ost ty A+ hn 


68 polynomial with coefficients in F. 
We consider the square matrix of order m 


00...0 —«, 
l1 0. ..0 — & 4 
L=|10 1. .,,0 —Gp all> (36) 


0 0 ° e . 1 = oT 
It is not difficult to verify that 7(A) is the characteristic polynomial of L: 


A 00. ..90 ag, | 


—] A 0... .04@,, 
|AE—L|=| 9 ~1 4. .°.0 a,» | =9) 


0 - 0 0...—l1 ata 


On the other hand, the minor of the element a, in the characteristic 


Dm (A) 
determinant is equal to +1. Therefore D,,_,(4) =1 and 1;(A) = Daa(A) = 
Dy (A) = 9 (A), t2(A) =... = tm (A) = 1. 

Thus, D has a single invariant polynomial different from 1, namely 
g(a). 
We shall call Z the companion matrix of the polynomial g(/). 
Let A= | Quy |i be a matrix with the invariant polynomials 
4, (A), te (A), --- 5 4, (A), t44 (A) =H], ..., 4, (A) = 1. (37) 


Here the polynomials 1, (A), 12(A),..., %(A) have positive degrees and, from 
the second onwards, each divides the preceding one. We denote the com- 
panion matrices of these polynomials by Zi, Le,..., Ly. 

Then the quasi-diagonal matrix of order n 


DS Dy de ah (38) 


has the polynomials (37) as its invariant polynomials (see Theorem 4 on 
p. 141). Since the matrices A and L; have the same invariant polynomials, 
they are similar, i.e., there always exists a non-singular matrix U (| U | #0) 
such that a 


150 VI. EQurvaLENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 
A=ULU-3 (I) 


The matrix L, is called the first natural normal form of the matrix A. This 
normal form is characterized by: 1) the quasi-diagonal form (38), 2) the 
special structure of the diagonal blocks (36), and 3) the additional condi- 
tion: in the sequence of characteristic polynomials of the diagonal blocks | 
every polynomial from the second onwards divides the preceding one." 


2. We now denote by 
41 (A), xe 7) ey (A) (39) 


the elementary divisors of A = || ay, ||? in the number field r. The corres- 
ponding companion matrices will be denoted by 


LM, 1, ..., L. 


Since ,(4) is the only elementary divisor of L® (7 =1, 2,..., w),78 the 
quasi-diagonal matrix 


Lbyu= { LM, Le), ..., L)} (40) 


has, by Theorem 5, the polynomials (39) as its elementary divisors. 

The matrices A and Ly, have the same elementary divisors in F. There- 
fore the matrices are similar, i.e., there always exists a non-singular matrix 
V (| V| #0) such that 


A= ViInV—. (II) 


The matrix Ly; is called the second natural normal form of the matrix A. 
This normal form is characterized by: 1) the quasi-diagonal form (40), 
2) the special structure of the diagonal blocks (36), and 3) the additional 
condition: the characteristic polynomial of each diagonal block is a power 
of an irreducible polynomial over F. 

Note. The elementary divisors of a matrix A, in contrast to the invariant 
polynomials, are essentially connected with the given number field r. If we 
choose instead of the original field F another number field (which also con- 
tains the elements of the given matrix A), then the elementary divisors may 
change. Together with the elementary divisors, the second natural normal 
form of a matrix also changes. 


17 From the conditions 1), 2), 3) it follows automatically that the characteristic poly- 
nomials of the diagonal blocks in L, are the invariant polynomials of the matrix L, and, 
hence, of A. 

18 x4(A) is the only invariant polynomial of L‘/) and is at the same time a power of a 
polynomial irreducible over F. 


§ 6. THe Normau Forms or a Matrix 151 


Suppose, for example, that A = || ay, ||? is a matrix with real elements. 
The characteristic polynomial of the matrix then has real coefficients. But 
this polynomial may have complex roots. If F is the field of real numbers, 
then among the elementary divisors there may also be powers of irreducible 
quadratic trinomials with real coefficients. If F is the field of complex 
numbers, then every elementary divisor has the form (4 — 4,)?. 


3. Let us assume now that the number field Fcontains not only the elements 
of A, but also the characteristic values of the matrix.’® Then the elementary 
divisors of A have the form”® 


(A—A)P, (A—Ag)™, ---, (A—A,)?u (Py t+ Petes t+ p,=n). (41) 
We consider one of these elementary divisors: 
(A — Ay)? 
and associate with it the following matrix of order p: 


) 40 0...0 
0’”4 1 ..0 


Po ee | gO 4+ HO, (42) 
000...1 
0 0 0 1; 


It is easy to verify that this matrix has only the one elementary divisor 
(A —A,)?. The matrix (42) will be called the Jordan block corresponding to 
the elementary divisor (4 — 4,)?. 
The Jordan blocks corresponding to the elementary divisors (41) will 
be denoted by 
Dy jhe 5. ay Sipe 
Then the quasi-diagonal matrix 
J={d,,d,,..., 5,} 


has the powers (41) as its elementary divisors. 
The matrix J can also be written in the form 
J={4,2,+ A,, 4,#,4+Hq,... 4,4,4+ F,}; 


where 
E,=E), H,=H) (k=1,2,...,4u). 


19 This always holds for an arbitrary matrix 4 if F is the field of complex numbers. 
20 Among the numbers 4, A,, ..., dy there may be some that are equal. 


152. VI. EqQuivaLEnt TRANSFORMATIONS OF POLYNOMIAL MaTRICES 


Since the matrices A and J have the same elementary divisors, they are 
similar, i.e., there exists a non-singular matrix T(| T | 40) such that 


A=UJT 1 =T (A,B, + A,, Aghgt Hy, ..., AH, +H,)T* (1) 


The matrix J is called the Jordan normal form or simply Jordan form 
of A. The Jordan normal form is characterized by its quasi-diagonal form 
and by the special structure (42) of the diagonal blocks. 

The following scheme describes the Jordan matrix J for the elementary 
divisors (4 — 4,)?, (A— 42)8, 4— As, (A— Aa)?: 


a4, 1:00000 01 
04:0 0000 01 
00 4 10:00 0 
ya 9 0.0 4 1:0 0 0 a) 
00:0 0 4:0 0 0 
000 0 0.4.0 0 
00000041 
000000 04 


If (and only if) all the elementary divisors of a matrix A are of the first 
degree, the Jordan form is a diagonal matrix, and in this case we have: 


A=T (Ay, gy eer AQT (44) 


Thus: A matrix A has simple structure (see Chapter ITI, § 8) tf and only 
if all its elementary divisors are of the first degree.” 
Instead of the Jordan block (42) sometimes the ‘lower’ Jordan block of 


order p is used: © 


ly 0... 0 Of 

1a... 00 

De — ak”) 4 FP” 
‘Ay (0 


0. ..01A 


This matrix also has the single elementary divisor (A—d,)? only. To the 
elementary divisors (41) there corresponds the lower Jordan matrix.” 


21 The elementary divisors of degree 1 are often called ‘linear’ or ‘simple’ elementary 


divisors. 
22 The matrix J is often called the upper Jordan matrix, in contrast to the lower 


Jordan matrix J (3). 


§ 7. THe ELEMENTaRY Drivisors or a MatrIx 153 


Jay= {1014+ F,, AH, + F. » ee e9 AH, + F,} 
(E, =Ee), FL, =F); £=1,2,...,4). 
An arbitrary matrix A having the elementary divisors (41) is always 


similar to Ji), ie., there exists a non-singular matrix T, (| 7; | 540) such 
that 


APTI (TS TE Bi Ae Ps ee AT OY) 
We also note that if 4, + 0, each of the two matrices 
As (E” 4 HR”) , Ao (EB? + F”) 


has oniy the single elementary divisor (A —A.)?. Therefore for a non- 
singular matrix A having the elementary divisors (41) we have, apart from 
(III) and (IV), the representations 


A=T,{A, (2,4 4), A,(E, + H,), ose A, (Ey + H,)} Te", (V) 
A=T7,(4, (Ey +F,), A(Es+ Fo), 6, AE, + Fu) } Ts’: (VD) 


§ 7. The Elementary Divisors of the Matrix f(A) 


1. In this section we consider the following problem: 


Given the elementary divisors (in the field of complex numbers) of a 
matriz A = || au ||1 and given a function f(A) defined on the spectrum of A, 
to determine the elementary divisors (in the field of complex numbers) of 
the matriz f(A). 

The matrix f(A) does not alter if we replace the function f(A) by a poly- 
nomial that assumes on the spectrum of A the same values as f(A) (see 
Chapter V, §1). Without loss of generality we may therefore assume in 
what follows that f(A) isa polynomial. 

We denote by 


(A—A,)", (A—A_)”, 2. (A—A,)™ 


the elementary divisors of A.?3_ Thus A is similar to the Jordan matrix 
A=TJT’, 
and so a 
| f(AV=TH(J)T 


Sse ses es 
23 Among the numbers-d,, 4,,..., 4, there may be some that are equal. 


VI. EQuiIvALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


454 
Moreover, 

J={(I, 5g, Uy, AHAB + APO G=1, 2,..., 4) 
and 


F(J)={t (1), F(A), «ees FD}, (45) 
where (see Example 2 on p. 100) 


ro 
f(a.) He Fe 
0 a ; | 
HIQ=| - is, “| (46) 
' f’ (a) 
I! 
| 0 0 fw : 


Since the similar matrices s(A) and f(J) have the same elementary 
divisors, we shall from now on consider f(J) instead of f(A). 


g, Let us determine the defect’* d of f(A) or, what is the same, of f(J). 
The defect of a quasi-diagonal matrix is equal to the sum of the defects of 
the various diagonal blocks and the defect of f(Ji) (see 46) ) is equal to the 
smaller of the numbers &; and p;, where &; is the multiplicity of 4; as a root 


of f(A),?* so that 
fAJ=P (Adee = ho MAD =0, f(a) KO (i= 1, 2, ..., wd. 


We have thus arrived at the following theorem : 
THEOREM 8: The defect of the matrix f(A), where A has the elementary 
divisors 


(A —A,)" ’ (a i A,)™ # See's (A i—-3 A.) (47) 


as given by the formula 


d =2 min (K,, ,); (48) 


24 d==n —r, where r is the rank of f(4). If the elementary divisors of a matrix are 
known, then the defect of the matrix is determined as the number of elementary divisors j 
corresponding to the char-cteristic value 0, i.e., as the number of elementary divisors of 
the form 4°. 

25%; may be equal to zero; in that case f(A:) 5 0. 


§7. THe ELEMENTARY Dvisors oF A Matrix 155 


here ky is the multiplicity of 4, as root of f(A) (c=1, 2,..., w).78 

As an application of this theorem we shall determine all the elementary 
divisors of an arbitrary matrix A = | ai, |i? that corresponds to a charac- 
teristic value A): 


Avg yee, h—dgy (A Ag), 0025 (A—Ag)®s 26 (A Ag), «02s (A Ag)”, 
91 Je Im 
where 9, = 0 (1=1,2,...,2%—1), gm > 0, provided the defects 


d,, d,, re | 


of the matrices i 


A—A),E, (A —AgH)*, ..., (A —AgE)™ 
are given. 

For this purpose we note that (A — 4, E)’=f;(A), where f;(4) = (A—A))/ 
(7=1,2,...,m). Im order to determine the defect of (A — 4,£Z)’ we have, 
therefore, to set k; = 7 in (48) for the elementary divisors corresponding to 
the characteristic value J, and k; = 0 for all the other terms (7 = 1, 2,...,m). 
Thus we obiain the formulas 


P+ Got Igter-t+ In=41, 
Gy + 29, +293 +++++ 29, =ade, 
91+ 29, +393 + +++ + 39m = ds, (49) 
Jy + 2g +395 +o 2° + MO, ay - 
Hence?’ 
jj = 2d, —d,_, —4,, (7 =1.2,...,m; dg=0, d,,, =d,,). (50) 


3. Let us return to the basic problem of determining the elementary divisors 
of the matrix f(A). As we have mentioned above, the elementary divisors 
of f(A) coincide with those of f(J) and the elementary divisors of a quasi- 
diagona) matrix coincide with those of the diagonal blocks (see Theorem 5). 
Therefore the problem reduces to finding the elementary divisors of a matrix 
C of regular triangular form: 


26 In the general case, where f() is not a polynomial, then min (ki, p.) in (48) has to 
be interpreted as the number p, if 


fOAO =f’ Cs) =... =f (Av) =0 
and as the number ki = px if 
f(r) =f’ (a) So = FHM) =O, 


FED) 0 
(¢=1,2,...,%). 
27 The number m is characterized by the fact that dy_1< dm—= dmi; (f= 1,2,...). 


156 VI. EQuIvALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


471) a, ° e e ay_41 


? 
p-l 0 Ay F : | 
C= SaH=|- -° +. |. (51) 
r=x0 . ° ; a 


| 0 0... .a@ |! 
We consider separately two cases: 


1. a, 0. The characteristic polynomial of C is obviously equal to 
D, (A) =(A—ap)’. 
Since D,_,(4) divides D,(A) without remainder, we have 


D,_1 (A) =(A—ay)’ (9sp)- 


Here D,_,(4) denotes the greatest common divisor of the minors of order 
p — 1 in the characteristic matrix 


A— ap — ay,. rn —Qp_1 
0 A—Qy ° ‘ 
AE—C= 
ay 
+ ° 
0 0 ° e e A— ap 


It is easy to see that when the minor of the zero element marked by ‘+’ 
is expanded, every term contains at least one factor A — do, except the prod- 
uct of the elements on the main diagonal, which is (— a,)?~* and is there- 
fore in our case different from zero. But-since D,_,(4) must be a power 
of A — dp, we see that g=0. But then it follows from 


D,y(A)=(A—ay)”, Dp (A) =1 


that C has only the one elementary divisor (4 — dy)?. 


2. a, ==...=d&-1=0, a, 0. In this case, 
CO =a +0,H" +-+++a, ,H?™ 
Therefore for the positive integer j the defect of the matrix 


(C —a,B)'= aH" 4 vee 
is given by 


§7. THe ELEMENTARY Divisors or A MATRIX 157 


d =|" when kjisp, 
= 


P, when kj>p. 
We set 


p=gqk+h (0<h<k). (52) 
Then?® 
d, =k, d, = 2k, -..,d, = gk, di41 =P. (53) 


Therefore we have by (50) 
Jy c= 9,-1 — 9, Jg=k—h, Jo41— h. 
Thus, the matrix C has the elementary divisors 


(A— ay)", as (A —a,)** ’ (A—ay)’, . A —y)* (54) 
h k—h 


where the integers g > 0 and h = U are determined by (52). 


4, Now we are in a position to ascertain what elementary divisors the matrix 
f(J) has (see (45) and (46)). To each elementary divisor of A 


(A — Ag)? 


there corresponds in f(J) the diagonal cell 


f’ (Ag) gros) 
Ha) TP aye 
ee 4.) 0 f (Ag) . : 
{ gl + H) = 3". ee ee . (55) 
7 i ° . f’ (Ag) 
on 
, O 0 *f (Ag) 


Clearly the problem reduces to finding the clementary divisors of a cell 
of the form (55). But the matrix (55) is of the regular triangular form 


(51), where 


ma 
y= f (Ag) Oy =f (Ag)s y= 2, 


Thus we arrive at the theorem: 


28 In this case the number q + 1 plays the role of m in (49) and (50). 


158 VI. EQuivaLENT TRANSFORMATIONS OF POLYNOMIAL Ma‘rTRICES 


THEOREM 9: The elementary divisors of the matrix f(A) are obtained 
from those of A an the folluwing way: To an elementary divisor 


(A — Ag)? (56) 
of A for p=1 or for p> 1 and f'(d,’ AO there corresponds a single ele- 


mentary divisor 


(A —f (Ag) (57) 


of f(A); for p > land f(A) =... =f¥ (A) = 9, fP(29) £0 (Kk <p) to the 
elementary divisor (56) of A there correspond the following elementary 


divisors of f(A): 
(A—f(d))™ ern €. “ayy, (A—f (Ag)... (AF (Ag), (58) 


h k—h 
where 
p=qkt+h, OSq, OSh<&K; 
finally, for p > 1, f(A) =... =f? M(A,) =O, to the elementary divisor (56) 


there correspond p elementary divisors of the first degree of f(A) : 
A—f (dg), -.-, A—F (Ao) - (59) 


We note the following special cases of this theorem. 

1. If Ay, d2,...,4n are the characteristic values of A, then f(Ay), FCe). 
...,f(An) are the characteristic values of f(A). (In both sequences each 
characteristic value is repeated as often as its multiplicity as a root of the 
characteristic equation indicates. )*° 

2. If the derivative f’(A) is not zero on the spectrum of A,* then in 
going from A to f(A) the elementary divisors are not ‘split up’ 2.e., of A 
has the elementary divisors 


(A—a,)", (A— Aa) aes (A—A,)" 
then f(A) has the elementary divisors 
(A—f(Ay))", (Af (Ag))", -- 2, (AF (An) 


Pn 


ed 


29 (57) is obtained from (58) by setting k = 1; (59) is obtained frum (58) by setting 
k==pork> p. 

80 Statement 1. was established separately in Chapter IV, p. 84. 

31 Tie., f (Ac) # O for those A, that are multiple roots of the minima] polynomial. 


§ 8. GeneraL METHOD or ConstTRUCTING TRANSFORMING Matrix 159 


§ 8. A General Method of Constructing the Transforming Matrix 


In many problems in the theory of matrices and its applications it is 
sufficient to know the normal form into which a given matrix A= | ix |? 
ean be carried by similarity transformations. The normal form is com- 
pletely determined by the invariant polynomials of the characteristic matrix 
AE —A. To find the latter, we can use the defining formulas (see (10) on 
p. 139) or the reduction of the characteristic matrix AE — A to canonical 
diagonal form by elementary transformations. 

In some problems, however, it is necessary to know not only the normal 
form A of the given matrix A, but also a non-singular transforming matrix T. 


1. An immediate method of determining T cousists in the following. The 
equation 


A=TAT>-? 
ean be written as: 
AT—TA=O. 


This matrix equation in T is equivalent to a system of n? linear homogeneous 
equations in the n? unknown coefficients of 7’. The determination of a 
transforming matrix reduces to the solution of this system of n? equations. 
Moreover, we have to choose from the set of all solutions one for which 
|7| 340. The existence of snch a solution is certain, since A and A have 
tthe same invariant polynomials.*? 

Note that whereas the normal form is uniquely determined by the matrix 
A,** for the transforming matrix T we always have an innumerable set of 
values that are given by 


T= UT,, (60) 


where 7; is one of the transforming matrices and U is an arbitrary matrix 
that is permutable with A.** 


22 From this fact follows the similarity of 4 and A. 


33 This statement. is unconditionally true as regards the first natural normal form. 
As far as the second normal form or the Jordan normal form is concerned, they are 
uniquely determined to within the order of the diagonal blocks. 


34 The formula (60) may be replaced by 
fies TV, 


oe 


where V is an arbitrary matrix permutable with A. 


160 VI. EqQutvaLeENnt TRANSFORMATIONS OF POLYNOMIAL MATRICES 


The method proposed above for determining a transforming matrix T is 
simple enough in concept but of little use in practice, since it requires a great 
many computations (even for n= 4 we have to solve 16 linear equations). 


2. We proceed to explain a more efficient method of constructing the 
transforming matrix 7. This method is based on the Supplement to Theorem 
7 (p. 148). According to this, we can choose as the transforming matrix 


T=Q(A), (61) 
provided 
AE — A=P(A) (AE — A)Q(A). 


The latter equation expresses the equivalence of the characteristic matrices 


JE —A and AE—A. Here P(A) and Q(A) are polynomial matrices with 
constant non-zero determimants. 
For the actual process of finding Q(A) we reduce the two A-matrices 


AE — A and AE —A to canonical form by means of the corresponding 
elementary transformations 


{tn(A), m—1(A), -. +, (A) } = Pi (A) (AE — A) Qi (A) (62) 
{tn(A), tr—1(A),..., (A) } = Po(A) (AB — A) Q2(A) (63) 
where 
Qi(A) = T;T2,..-, Tp, Q2(A) = TTT? . Th, (64) 
and where T;,..., Tp, T*%,..., Tp are the elementary matrices correspond- 


ing to the elementary operations on the columns of the A-matrices 4H — A 
and AE — A. From (62), (63), and (6+) it follows that 


AE —~ A= P(A) (AE — A)Q(A), 
where 


Q(A)= Q1(A) Qa! (4) = TT ys Ty TAIT PTT (65) 


We can compute the matrix Q(A) by anplying successively to the col- 
umns of the unit matrix E the elementary operations with the matrices 
ee Tc Tp, },..., Ty~). After this (in accordance with (61)) we 
replace the argument 4 in Q(A) by the matrix A. 

Example. 

i 23 1 —1 
Bess — 3 3 —6& 4 

8 —4 3 —4 

15 —10 ll —ll 


§ 8. GENERAL MetTHop oF CONSTRUCTING TRANSFORMING MATRIX 161 


Let us introduce a symbolic notation for the right elementary operations 
and the corresponding matrices (see pp. 130-131) : 


T’= [(c) 8], T’ =[6 + @ (4)) 7], T°" =[H]. 


In transforming the characteristic matrix AE — A into normal diagonal 
form we shall at the same time keep a record of the elementary right opera- 
tions to be performed, i.e., the operations on the columns: 


\ oe oe ae | 1 0 0 0 1 
_ 34—3 5 —4 44—1 A+1 1 —4 
a —8s8 4 A4—3 4 , —4,—4 0 A+1 4 : 
—15 10 —ll &44ll —A2—10A—4 —A—1 2 A411 
| 1 0 0 0 
—4 A+1-1 44—-1 ; 
| 4 0 ati —44—4 : 
A+11] —A—1 4A —A?—10I~—4 
l 0 0 0 1 0 0 0 
0 A+l <-1 44—1 0 0 1 0 
0 0 A+1 —44—4)\|’ ||0 —A*—22—1 A+1 —422?—7A~3]]|’ 
0 —A—1 4 —A—10I—4 0 —A?—2A—1 A — A2?—-91—4 
|}1 0 0 0 
‘Oo 1 0 0 
a 1 et et 77 8 | ||’ 
0 A —AP—2A—1 —5A2—91—4 |, 
11 0 0 0 1 0 0 0 
0 1 0 0 0 L 0 0 
0 0 A®42441 44947243 ||’ |10 0 4242441 —A?—34—2 |’ 
0 0 £942241 547+91 44 0 0 A8#4+244+1 ae eee | 
1 0 0 0 
0 1 0 0 ; 
0 0—A?—2~—1 A+1 . 
0 0 A§+244+1 —/4—31—2 
10 0 0 1 0 0 0 | 
01 0 0 01 0 0 
00 A+1 —A2%—24—1 ||’ |1 0 0 Fees | 0 , 
0 0—A?—3A—2 8942441 0 0—Aa*—34—2 —(A+4+1)* 
10 O 0 
01 0 0 
00A+1 0 ° 
00 OO (4+1) 
Here 


Q; (4) = (1 + (1 — A) 4) [2 — 4] [3 + 4] (14) (2—(A in 1) 3) [4 + (1 — 44) 3] [23] x 
x [4— (5) 3] [43] [4 + (4 + 1) 3). 


162 VI. EQuivaLENT TRANSFORMATIONS OF POLYNOMIAL MATKICES 


We have found the invariant polynomials (A +1)’, (A+1), 1, and 1 
of A. The matrix has two clementary divisors, (A +1)* and (A4+1). There- 
fore the Jordan normal form is 

—’ 1 0 0 | 

Fan 0 —1 1 0 | 
> 0 o--—l 0 | 

0 0 0 —i | 

By elementary operations we bring the matrix AE — J into normal diago- 
nal form 


A+l —1 0 0 | lA+1 —1 0 o | 
i 0 a4+1 —1 0 0 0 —1 0 
eas 0 0 j%&£4+1 #0 : 0 (a+? 244+1 «20 
0 0 ‘0 A+] 0 0 0 Aa+l1 
| 0 —1 0 0 
0 0 —1 0 
(A+1)8 (441? 441 #0 : 
0 0 0 A+i 
0 10 0 r 0 90 0 | 
0 01 0 01 0 0 
a+ip00 0 |’ loo o jt ||’ 
0 004+1 | 0 0a+1 0 
10 O 0 | 
01 0 0 
002+1 +0 : 
00 O (a4+)1) 
Here 
Q, (4) =[2 + (4 + 1) 3] [1+ (4+ 1) 2] [12] [28] [34]. 
Therefore 


Q(A) = Q, (4) Qa? (A) 
= [1+ (1—A)4] (2—4) (3+ 4} [14] [2—(4 + 3)3) [4-+ (1 —4)3] [23] [4—(5) 3] 
x [43] [4 + (A + 1) 3] [34] [23] [12] [1 — (4 + 1) 2] [2—(4 + 3) 3). 


We apply these elementary operations successively to the unit matrix E: 


100 0 1 0 00 
pa_||9 20090 |. 0 1 00 |) 
0010 0 0 10 
0001 f= )> 
0 00 1 0 0 o 1 |! 
0 10 #0 0 1 0 O : 
0 01 0 “We eO! 22h a 0 ; 
1—1 1 1—A lL 2-9: 4. -Te=3 


§ 8. GENERAL MEtTHop or CoNSTRUCTING TRANSFORMING Matrix 163 


© 
le) 

a 
~~ 
> | 

ee 
<~ x 
| | 

oC = = 

— 
~~ 
eo) 

ee de 
ma NN 
mt NN 

oO «= | | 
<< S 
| | 

oe | 

oo Oo #& 
~~ << 
wt 06 

cae | 
ea Ol 

OO = = 
aN 

ee 
~~ xX 
| | 

ooors 


° 


=] 
© 
ot bee 
oe 
oO CO = x 
oo © «4 
19 
~~ 
+g 3+ 
eT + 
Pe 
“a 
+° 
we 
72 
~ 
ie 
D> 
© 
Ten) 

N 
OO m= mem 
oo O° -& 

ma NN 
Sesh 
<< 
| | 
© 
a es 
as 
— 
oo 0 #4 


—A—i 1446 
—A 1 


424.6445 
104 + 9 


Thus 


© 
® 
Ce 
~« 
oO = x 

a" 

| ~ 

| 
gi 
eee 
ge x 
7 +8 
tes 
ow 

Il 
<= 
> 


6 
12 


§—11 
01 


Observing that 


we have 


164 Vi. EQuiIvaLENT TRANSFORMATIONS OF POLYNOMIAL MatTRICES 


Check: 
6 oak:.. Wee 4 o—1 1-1] 
ay | 6 —5 5 —1 6 —5 5 
AT = Sk Sas oo = O—4 3 — 5]|’ 
| pw n—w 1—12 ll —12!! 
i.e., AT= TJ. 
0 10 21 
= 1—5 0—5] __ 
—l 110 12 
Therefore 
A=TJT". 


§ 9. Another Method of Constructing a Transforming Matrix 


1. We shall now explain another method of constructing a transforming 
matrix which often Jeads to fewer computations than the method of the 
preceding section. Tlowever, we shall apply this second method only when 


the Jordan normal form and the elementary divisors 
(A— A)", (A —Ag)”*,... 


of the given matrix A are known. 
Let A= TJT—}, where 


Pi 
A 1... 0 
ry os 
JA, BP) + AP), gp) WM), y= |]0 . At Pa 
Age Ves. 4 8 
ut 
0 "Ag 


the matrix equation 


AT =TJ 


Then denoting the k-th column of T by t, (k=1,2,...,n), we replace 


§ 9. ANoTHER METHop or CoNSTRUCTING TRANSFORMING Matrix 165 


by the equivalent system of. equations 


Aty= At, At, =A,t, + t1,.->A tp, = Ait, + yA (67) 

Atp 41 = Astp41, Atp,p2 = Agtr 42 + trrgis.- +» Alprmn= Astrr + tr4p-1 (68) 
which we rewrite as follows: 

(A —A,F)t, =O, (A—A,E)t, = t,,...,(A—A,B) ty, = by-1 (67’) 

(A — A,B) tp,41 =O, (A — A,B) type = t,41,-- +» (A — AE) typ, =trpp,-1 — (88’) 


ee 8 e@© @© © #© © @© e© e@ 8 © e@ © © 8 #© © #© © © @ © 2&© © © © © &©» © © © 


Thus, all the columns of 7 are split into ‘Jordan chains’ of columns: 
[t1, te, --- sto], [losis boter-+-s tran, - 

To every Jordan block of J (or, what is the same, to every elementary 
divisor (66)) there corresponds its Jordan chain of columns. Each Jordan 
chain of columns is characterized by a system of equations of type (67), 
(68), ete. 

The task of finding a transforming matrix 7 reduces to that of finding 
the Jordan chains that would give in all n linearly independent columns. 

We shall show that these Jordan chains of columns can be determined by 
means of the reduced adjoint matrix C(A) (see Chapter IV, § 6). 
For the matrix C(A) we have the identity 


(AB — A)C(A)= (AE. (69) 
where y(A) is the minimal polynomial of A. 
Let 
vy (A) = (A — g)y (A) (x (4g) 0). 


We differentiate the identity (69) term by term m — 1 times: 


(AB — A) C’ (4) + 0 (A) = yy (2) EB 
(AB — A) 0” (2) + 20" (2) =p" (2) B 70) 


e e ee @® oe o«  ® $e®© je j@ $@ @  @  @ oe  «¢ e® eo 


(AZ — A) CO) (4) + (m — 1) CO) (4) =p MY) (A) EB. 


Substituting A, for 4 in (69) and (70) and observing that the right-hand 
sides are zero, we obtain 


(A —4,E)C=0, (A—A,F) D=C, (A—A, FE) F=D,...,(A—A,E)K =G; (71) 
where 


1 Lin eee 
C=C (A,), D=5,0 (A,); F=5,¢ (Ag)esaie ny ar ey C(m-2) (A,) as 


K= C(m-1) (Ao) . 


1 
(m— 1)! 


166 VI. EqQuivaLENtT TRANSFORMATIONS OF POLYNOMIAL MATRICES 
In (71) we replace the matrices (72) by their k-th columns (4 =1, 2,..., 
m). We obtain: 
(A —- A,E) Cy == 0, (4 — AE) Dy = Cys...) (A — 4,8) Ky =O (73) 
(A=1,2,...,n). 
Since C= C(A,) 4 O,* we can find a k (=) such that 
C 0. (74) 


Then the m columns 
Ci; Dy, Fuses G., K, (75) 


are linearly independent. For let 


yO, + 6Dp +... + Ky =o. (76) 

Multiplying both sides of (76) successively by A —2.E...., (4 —A,E)™—}, 
we obtain 

oC, +... + xGy=o,...,xCy=o. (77) 


From (76) and (77) we find by (74): 


y=o=...=>x%=0. 


Since the linearly independent columns (75) satisfy the svstem of equa- 
tions (73), they form a Jordan chain of vectors corresponding to the ele- 
mentary divisor (A—A,.)™ (compare (73) with (67’)). 

If C,= 0 for some k, but D; + o, then the columns D,...., Gy, K; form 
a Jordan chain of m —1 vectors, ete. 


2. We shall now show first of all how to construct a transforming matrix T 
in the case where the elementary divisors of A are pairwise co-prime : 


(A— Ar)™, 6. CAA, 
(AA; for 14 9;1,j=1,2,...,8). 


With the elementary divisor (4 — 4;)”? we associate the Jordan chain of 
columns . 
CH), DO, ... , QM, KM, 


constructed as indicated above. Then 
(A -- 4,8) OY = 0, (A — A,B) D =C, ...,(A —AE) KO =Q”, (78) 


When we give to 7 the values I, 2,..., s, we obtain s Jordan chains containing 
m columns in all. These columns are linearly independent. 


35 From C(A.) ==O it would follow that all the elements of C(A) have a common 
divisor of positive degree, in contradiction to the definition of C(A). 


§9. ANOTHER METHOD or ConstRucTING TRANSFORMING MatTHx 167 


For, suppose that 


D> [yc ae 6,D™ i eer 2 x KO | = Oo. (79) 


jal 
We multiply both sides of (79) on the left by 
(A — A,B). - (A — A;_, BY" (A — A,B" (A — Ay BY+- (A — A, BY™ (80) 


and obtain 
5 = 0 ‘ 


Replacing m;— 1 suecessively hy m;— 2, m; —3,... in (80), we find ; 
yj=6=...=4=0 (GF =1,2,...,8), 
and this is what we had to prove: 
We define the matrix 7 by the formula 
T=(C%, DY, ..., KE, co, D®, 2, Ks ...;C, DO, ..., K), (81) 


Example. 
8 8 —10 —3 
a | y (A) =A (A) = (A— 12 (4-4 1) = AS — 27 4 OL, 


2 |I 
I 2 3—2 —4 | , elementary divisors: (A — 1%) , (A + 1)?, 
2) p(w) —y(a 
3) anw) = POSH Hag det 2) BOA, 


oo ¢ oe Oo @® e@ @® oe 


C(a)= W (AE, A) = A? + AA® + (42 — 2) A + (48 — QA) EK. 
We make up the first column C,(A): 
C, (A) = [49], + 4 [A7}, + (A? — 2) Ay + (429 — 24) £,. 


For the computation of the first column of A*® we multiply all the rows 
of .4 into the first column of A. We obtain :** [A?], =(1,4, 0,2). Multiply- 
ing all the rows of A into this column, we find: [4°], = (3, 6, 2.3). 


Therefore 
| 3 I I 3! ] lee kee 
6 4 | 2 | () O22. 4h + Q 
a= a t4llo ft te? I —_— ‘| ae ee | 
13 2 11 | lo AP 492 4-1 | 


36 The columns into which we multiply the rows are written underneath the cows of 4. 
The elements of the row of column-sums are set up in italics, for checking. 


168 VI. EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


Hence €;(1) = (0,8,0,4) and C7(1) = (8,8,4,4). As C,(—1)= 
(0, 0,0,0), we pass on to the second column and, proceeding as before, we 
find: C2(— 1) = (—4, 0, —4,0) and Cs(— 1) = (4, —4, 4,4). We set 
up the matrix: 


|0 8 —4 4 
(0, ), 0% (1) 5 OD), %(—D) = | ; : > a 
l44 0—4 


We cancel** 4 in the first two columns and —4 in the last two columns. 


We leave it to the reader to verify that 


1 0 0 
1 0 0 
0 1 1 
0 0 J 


ooo Ff 


3. Coming now to the general case, we shall investigate the Jordan chains 
of vectors corresponding to a characteristic value 4) for which there are p 
elementary divisors (A—A,.)™, q elementary divisors (A —4A,)™~?, r ele- 
mentary divisors (A — A,)”~?, ete. 

As a preliminary to this, we establish some properties of the matrices 


C=C (4), D=C' (A), F = ne” (Ag), «+5 K= = WY om (4). (82) 


(m 


1. The matrices (82) can be rcpresenitcd in the form of polynomials in A: 


C=h, (4), D=h,(A), .... K=hm (A), (83) 
where 
h, =e (6 =1,2, «2 m). (84) 
For 
C (4) = W (AB, A), 
where 


37 A Jordan chain remains a Jordan chain when all its columns are multiplied by a 
number c * 0. 


§ 9. ANOTHER METHOD oF CoNStTRUCTING TRANSFORMING MATRIx 169 


Therefore 
1 ] : 
FO (a) = 4 (40 AD, (85) 
where 
1 1 [ a 
_. pF) Ans = 7/5" A, | 
git Com gil age? (OH) das 
1 i * y (4) 
RY laa® p— Alana, (u—Ay)*t! oe 


(83) follows from (82), (85), and (86). 
2. The matrices (82) have the ranks 


p,2p+q, 3p+2%4q+7,.--- 


This property of the matrices (82) follows immediately from 1. and 
Theorem 8 (§7), if we equate the rank to n—d and use formula (48) 
for the defect of a function on A (p. 154). 

3. In the sequence of matrices (82) every column of each matriz is a 
linear combination of the columns of every following matriz. 

Let us take two matrices hi(A) and hy(A) in (82) (see 1.). Suppose 
that i<k. Then it follows from (84) that: 


h; (A) = hy (A) (A—4, EYP. 


Hence the j-th column y; (j= 1, 2,..., 7) of h,( A) is expressed linearly by 
the columns 21, 22,-.., 2n Of h;,(A): 


n 
Yj= Dd) % %: 
g=1 
where a1, @2,..., Gn are the elements of the j-th column of (A —A,E)*~?. 


4. Without changing the basic formulas (71) we may replace any col-' 
umn in C by an arbitrary linear combination of all the columns, provided 
we make the corresponding replacements in D,..., K. 

We now proceed to the construction of the Jordan chains of columns for 
the elementary divisors 


Aes Gai. aa. 
eee 
P q 
Using the properties 2. and 4., we transform the matrix C into the form 


O =(Cy, On» «05 Cys 0,0, v +4 0)3 (87) 


170. «Vi. EQuivaLENT TRANSFORMATIONS OF POLYNOMIAL MATRICES 


» x 


where the columus (), Co, .... Cp are linearly independent. Now 


DED Dg cdg Di Dea Dae 


By 3., for every ‘(11S p) €(, is a linear combination of the columns 


Dy, Do, ere fs 
O;=a,Dy +++ + tpDp + eer Dir + +++ + tnDn- (88) 


We multiply both sides of this equation by A —j,E. Observing (see (73) ) 


that 
(A—A,E)Cj=0 (i=1,2,...,p). (A—AE) Dj =Cy (F=1,2,..-52), 


we obtain by (87) 
0 = 4,0, + aC, + 20+ + aC; 


hence in (88) 
++ =a,— 0. 
Therefore the columns C;. Co,..., Cp are linearly independent combinations 
of the columns Dy41,.--.Dn. Therefore by 4. and Do we ean, without chang- 
ing the matrix C, take the columns C,,.... (’, instead of Dpyi,..-. Dey and 
zeros instead of Dop4¢41,---) Dn. 
Then the matrix D assumes the form 


D=(Dyy «+5 Dp Cys Op «+ +1 Ops Dopts + «+2 Doptgs 0 O «+43 0). (89) 


In the same way, preserving the forms (87) and (89) of the matrices C and 
D, we can represent the next matrix F in the form 


F=(F,, e008 PF); Di, oes Dy: Ponti Popigd Cis ees Cy; \ 


90 
Doptis sees Dopige Fsptoqtis +++ Fappogtrs Or ++ +5 0), J on 
etc. 
Formulas (73) gives us the Jordan chains 
m ™ 
A ee nape SS, 
(Oi Dinavcg Mle saqe (Oa Dy, acu Ia): 
ee Og tl 
: (91) 


m1 m1 


en TN, 
(Dop+1> Pop+1. eves Kops): eee (Doptg Poni ove Kop+q)3 eee 
q 


These Jordan chains are linearly independent. For all the columns C, 
in (91) are linearly independent, because they form p linearly independent 
columns of C. All the columns C;, D; in (91) are independent, because they 
form 2p + q independent columns in D, etc.; finally, wl the columns in (91) 


§ 9. ANOTHER METHOD or CONSTRUCTING TRANSFORMING Matrix 171 


are independent, because they form %=mpt+ (m—1)gt...mdependent 
columns in K. The number of columns in (91) is equal to the sum of the 
exponents of the elementary divisors corresponding to the given character- 
istic value Ao. 

Suppose that the matrix A = | Aix [|i has s distinet characteristic values 


(j=1,2,...,8; 
A (A) = (A—A,)™ (A—Ag)™ 0 (A—A,)" 
p (A) = (A—Ay)™ (A— Ag) (A—A,)™ ). 
For each characteristic value A; we form its system of independent Jordan 


chains (91); the number of columns in this system is equal to n; (;= 1, 2. 
3,...,8). All the chains so obtained contain n = 1; + no+... +n, cvulumns. 


Aj 


These n columns are linearly independent and form one of ‘he required. 
transforming matrices T. 
The proof of the linear independence of these » columns proceeds as 


follows. 
Every linear combination of these m columns can be represented in the 


form 
$ 
a H, =o, (92) 
j= 


where H;, is a linear combination of columns in the Jordan chains (91) 
corresponding to the characteristic value A; (7 = 1. 2,...,8). But every col- 
umn in the Jordan chain corresponding to the characteristic value 4; satisfies 


the equation 
(A A,B)"™izx=o. 
Theretore 
(4 — A,B)" Hy =o. (93) 


We take a fixed number 7 (178) and construct the Lagrange- 
Sylvester interpolation polynomial r(1) (See Chapter V, §§ 1, 2) with the 
following values on the spectrum of the matrix: 

r (Ag) re’ (Ay) =>... er ™) (4,)= 0 for tj 
and 
r (A= 1,9 (A) --- = YD (4) = 0. 

Then, for every 1 j, r(A) is divisible by (A — 4.) without remainder ; 

therefore by (93), 
r(A)H;=o (8))- (94) 


172 »=VI¥. EQuivaLent TRANSFORMATIONS OF POLYNOMIAL MaTRICES 


In exactly the same way, the difference r(2) —1 is divisible by (A ay)? 
without remainder; therefore 


Multiplying both sides of (92) by r(.4). we obtain from (94) and (95): 


H;==0. 


This is valid for every j= 1,2,....s. But H; isa linear corabination of 
independent columns corresponding to one and the same characteristic value 
4, (J=1,2....,8). Therefore all the coefficients in the linear combination 
H; (j7=1,2,...,8), and hence all the coefficients in (92), are equa! to zero. 

Note. Let us point out some transformations on the columns of the 
matrix 7 under which it is transformed into the same Jordan form (with the 
same arrangement of the Jordan diagonal blocks) : 


I. Multtplication of all the columns of an arbitrary Jordan chain by a 
non-zero number. 


Il. Addition to each column (beginning with the second) of a Jordan 
chain of the preceding column of the same chain, multiplied by one and the 
same arbitrary number. 


{II. Addition to all the columns of a Jordan chain of the corresponding 
columns of another chain containing the same or a larger number of columns 
and corresponding to the same characteristic value. 


Example 1. 
: : : ; - 4 (4) =(A— If (A+ )), 
eae me ms y(a4)=(A— 12 (44+ Y= BAA +1, 
_ : : ae — ; Elementary divisors of the matrix .4° 
7 A—1)?, (A—1)?, 441. 
Le] lL =) 2 ( pee Pee 
l1 100 0 
0100 0 
J=|0 01 1 0 })> 
0001 0 
0000-1 


F(A, py) = POP) Kt AN pt BAI, 


C (A) = W (AE, A) = A® 4 (A—1) A + (2 —A— DE. 


Let us compute successively the column of A? and the corresponding 
eolumns of C(4), C(1), C’(A), C’(1), C(—1). We must obtain two linearly 
independent columns of ('(1) and one non-zero column of C(—1). 


§ 9. ANOTHER METHOD OF CONSTRUCTING TRANSFORMING Matrix 173 


100 2x 1 0 0 Ix "100001 
e 010 2% 10 1-2 3% 101000, 
CY)=|0 01 O% ¢(A—HHO O-1 2%) + (2-2-1 lO0100 
le 2 ae Jt 2 1 0x Jooor9| 
|2 -2 2 2» | 1-1 1-1 00001; 
0 00 2x] j1 9 0 1] 110000) 
000 2x '0 1-2 3% eee 
C(+N=|10 00 Ox}, Cr(Ay=|0 0-1 2¥/ + (2A—100100), 
2-2 2-2 HL -1 1) Ox: 100010 
2-2 2-2x|l 1-1 1 -1+!| hears 
2% © 1 ¥ "0 0 0 #« KI 
lo ¥ * 3 x! 1004 * * 
C’(41=||0 * * 2 ¥]], C(—1)=|10 0 4 * ¥/i. 
1* * 1% ; 00%* x 
jl * *-1 x) lee ane et 
Therefore*® 
02 2 #10 
00 2 8 | 
T= (Ci(41, 40,6 (4 1, C(40,6(—D) S00 0 2 al. 
21—2 1 | 
i2 1 —2 —1 0 


The matrix T can be simplified a little. We 
1) Divide the fifth column by 4; 
2) Add the first column to the third and the second to the fourth ; 
3) Subtract the third column from the fourth; 
4) Divide the first and second columns by 2; 
5) Subtract the first column, multiplied by %, from the second. 


Then we obtain the matrix 


01210 
loser a 
T,=|(0 0 0 2 1 
1002 0 
11 0 0 0 Oli 


We leave it to the reader to verify that AT; = 7,J and | 71 | 4 9. 


Example 2. 
1 —1 1 —1 ear 1 
—3 3-5 - 4 ee 
A= 8 —4 3 4 , y a) a a 1)°. 
| 6 —10 ll ~— 11| Elementary divisors : (A + 1)3, 2 +1. 


38 Here the subscript denotes the number of the column; for example, C:(— 1) denotes 
the third column of C(—1). 


VI. Equivalent TRANSFORMALIONS OF POLYNOMIAL MATRICES 


174 
—I 1 0 0 
_i|| o-1 1 Oo 
—~f 0 OO —E olf 
0 0 O-—1 


_. yA) _ _ vA) _ _ eA) 
——— = (A + 1)%, hi (A= ae = A+ I, (=p = 
and the matrices®® 
C= h,(A)= (4+ 2), D=h, (A= ALE, F=E: 


0 oO 0 ‘0 | 2 —1 1 —1/ oreo 
d oo oy 


a a ee | a a oe, 
oe a ‘=a 7 fo 9 1 o 
| lo 0 o al 


C=! 0 0 oO 0 
_2 42-1 3 15 —10 11 —10 


For the first threc columns of 7 we take the third column of these mat- 
rices: 7’ = (C3, Ds, F'3,%). In the matrices C, D, F, we subtract twice the 
third enlumn from the first and we add the third cvolunin to the second and 


to the fourth. We obtain 
'0 0 0 0 0 oO 42 0 ) 100 01 
g@—||/9 9 107 5] 7-1 —5 —1 Fa 010 0} 
00 00 0 0 4 0 —2 111] 
lo o —1 0 —7 1 Mm sti I 0 0 0 11 


(n the matrices D, F, we add the fourth column, multiplied by 7, to the first 
and subtraet the fourth column from the second. We obtain 


1 Oo Oo 0 0 'O O 1 Oo 1 0 0 0 
é— 0 0 10} 5=|/09 09 —5 —1]) g__|] 0 100 |) 
0 0 0 0 0 0 4 O 5 OO: 1 | 
00 —1 0 00 11 1 2 229 O° Bf 


For the last column of 7 we take the first column of F. Then we have 


0 101 
1 —5 0 0 
rf) 41 6]]° 


—1 li O 7 


T = (Cy Ds, F,, #,) =| 


As a check, we can verify that AT = TJ and that | 7 | 540. 


: CHAPTER VII 


THE STRUCTURE OF A LINEAR OPERATOR 
IN AN n-DIMENSIONAL SPACE 


(Geometrical Theory of Elementary Divisors) 


The analytic theory cf clementarv divisors expounded in the preceding chap- 
ter hax enabled us to determine. for every square matrix a similar matrix 
having ‘normal’ or ‘eanonieal’ form. ©n the other hand, we have seen in 
Chapter III that the behaviour of a linear operator in an n-dimensional 
space with respect to various bases is given by means of a elass of similar 
matrices. The existence of 4 matrix of normal] form in such a class is closely 
connected with important and deep properties of a linear operator in an 
n-dimensional space. The study of these properties is the object of the 
present chapter. The investigation of the structure of a linear operator will 
lead us, independently of the contents- of the preceding chapter, to the 
theory of transformations of a matrix to a normal form. Therefore the 
contents of this chapter may be called the geometrical theory of elementary 
divisors. 


§ 1. The Minimal Polynomial of a Vector and a Space 
(with Respect to a Given Linear Operator ) 


1. We consider an n-dimensional vector space R over the field F and a 
linear operator A in this space. 
Let. x be an arbitrary vector of R. We form the sequence of vectors 


x, Ax, A®x, 05. (1) 


Since the space is finite-dimensional, there is an integer p (OS p= 1) 
such that the vectors x, Ax,..., A? ‘x are jinearly independent, while A?x 
is a linear combination of these vectors with coefficients in F: 


1The account of the geometric theory of elementary divisors to be given here is based 
on our paper [167]. For other geometrical. constructious of the theory of elementury 
divisors, see [22], §§ 96-99 and also [53]. 


176 


176 VII. Structure or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


A? x =— y, AP 12 — y, AP 2x — «6. — VpX° (2) 


We form the monic polynomial y(d) = 2? + y, #P 1+ ++++ yp ht yp. 
(A monic polynomial is a polynomial in which the coefficient of the highest 
power of the variable is unity.) Then (2) can be written: 


y(A)x=0. (3) 


Every polynomial ¢(2) for which (3) holcis will be called an annthilating 
volynomial for the vector x. But it is easy to see that of all the monic 
annihilating pulynomials of x the one we have constructed is of least degree. 
This polynomial wil! be called the minimal anmhilating polynomial of x or 
simply the minimal polynomial of x. 

Note that cverv annihilating polynomial ¢(A) of x is divisible by the 
minimal polynomial (A). 

For let 

Pp (A)= (A) x (A) +0 (A), 


where x(A), 0(A) are quotient and remainder on dividing ¢(A) by ¢(A). 
Then 
p (A) =x (A) p(A) % + 0(A) x= 0(A)# 


and therefore 9(A}x =o. But the degree of @(A) is less than that of the 
minimal polynomial w(4). Hence o(A) =0. 
From what we have proved it follows, in particular, that every vector 


‘x has only one minimal polynomial. 


2. We chouse a basis 2,,@,,..., @n in R. We denote by 9, (4), ge(A),..., 
7, (4) the minimal polynomials of the basis vectors e,, €,,...,¢, and by 
w(A) the least common multiple of these polynomials (y(4) is taken with 
highest coefficient 1). Then y(A) is an annihilating polynomial for all the 
basis vectors e,, €y,..., e,. Since every vector x ¢ R is representable in 
the form # = x, €, + % €,+°+++ 4%, €,, we have 


| yp (A) %= mp (A) e, + zyp(A)egt++-++ 2,y(A)e,=0; 
1L.@:,4 


y(A)=O. (4) 


The polynomial w(A) is called an amnthilating polynomial for the whole- 
space R. Tet p (Ad) be an arbitrary annihilating polynomial for the whole 
space R. Then (A) is an annihilating polynomial for the basis vectors 


2 Of course, the phrase ‘with respect to the given operator A’ is tacitly understood. 
For the sake of brevity, this circumstance is not mentioned in the definition, because 
throughout this entire chapter we shall deal with a single operator A. 


§ 2. DECOMPOSITION INTO INVARIANT SUBSPACES 177 


€,,€5,---,@,- Therefore’ p(4) must be a common multiple of the minimal 
polynomials 9,(2), g.(d),..., 9,(4) of these vectors and must therefore be 
divisible without remainder by their least common multiple y(A). Hence 
it follows that, of all the annihilating polynomials for the whole space R, 
the one we have constructed, w(/), has the least degree and it ispmonic. 
This polynomial is uniquely determined by the space R and the operator A 
and is called the minimal polynomial of the space R.2. The uniqueness of 
the minimal polynomial of the space R follows from the statement proved 
above: every annihilating polynomial ip(A) of the space R is divisible by the 
minimal polynomial w(2). Although the construction of the minimal poly- 
nomial y(A) was associated with a definite basis e,,e,,...,e,, the poly- 
nomial y(A) itself does not depend on the choice of this basis (this follows 
from the uniqueness of the minimal polynomial for the space R). 

Finally we mention that the minimal polynomial of the space R anni- 
hilates every vector x of R so that the minimal polynomial of the space ts 
divisible by the minimal polynomial of every vector in the space. 


§ 2. Decomposition into Invariant Subspaces 
with Co-Prime Minimal Polynomials 

1. If some collection of vectors R’ forming part of R has the property that 
the sum of any two vectors of R’ and the product of any vector of R’ by a 
number ae F always belongs to R’, then that manifold R’ is itself a vector 
space, a subspace of R. 

If two subspaces R’ and R” of R are given and if it is known that 

1. R’ and R” have no vector in common except the null vector, and 

2. every vector x of R can be represented in the form of a sum 


ea e+ a” (we RR’, xe R’), (5) 

then we shall say that the space R is decomposed into the two subspaces R’ 
and R” and shall write: 

R=R’' +R’ (6) 

Note that the condition 1. implies the uniqueness of the representation 


(5). For if for a certain vector x we had two distinct representations in the 
form of a sum of terms from R’ and R”, (5) and 


ex +2” (x’e R’, x’ « R’’) (7) 
then, subtracting (7) from (5) term by term, we would obtain: 


3 If in some basis e1, €2,..., €n a matrix A = || ax |? then the annihilating or minitaal 
polynomial of the space R (with respect to A) is the annihilating or minimal polynomial 
of the matrix A, and vice versa. Comnare with Chapter IV, § 6. 


178 VII. SwrructurRE or LINEAR OpERATOR IN n-DIMENSIONAL SPACE 


x’ — x’ — 2” — 2x”, 
i.e. equality of the non-null vectors «’—%’e« R’ and #”—x” « R", which, 
by 1., is impossible. 

Thus, condition 1. may be replaced by the requirement that the repre- 
yentation (5) be unique. In this form. the definition of decomposition 
immediately extends to an arbitra::: number of subspaces. 

Let 

R=R’+R’” 
and let e4, €3,..., & and e7, ef, ..., e,, be bases of R’ and R", respec- 
ively. Then the reader ean easily prove that all these n’ + n” vectors are 
linearly independent and form a basis of R, so that a basis of the whole space 
is' formed from bases of the subspaces. It follows, in particular, that 
man +n", 

Example 1. Suppose that in a three-dimensional space three directions, 
not parallel to one and the same plane, are given. Since every vector in the 
space can be split, uniquely, into components in these three directions, we 
have 


R=R’+R’ +R”, 


where. R is the set of all the vectors ot one space, R’ the set of all vectors 
parallel to the first direction, R”’ to the second, and R’”’ to the third. In 
this case, n= 3 and n’ =n” = n’”"=1, 


Example 2. Suppose that in a three-dimensional space a plane and a 
line intersecting the plane are given. Then 


R=R’ +R’, 


where R is the set of all vectors of our space, R’ the set of all vectors parallel 
to the ssiven plane, and R” the set of all vectors parallel to the given line. 
In this example, »=3, n’=2, n”=1. 


2. A subspace R’CR is called invariant with respect to the operator A if 
AR’ :_ R’, 1.0. if x « R’ implies Ax e« R’. In other words, the operator A carries 
a vector of an invariant subspace into a vector of the same subspace. 

Tn what follows we shall earry out a decomposition of the whole space 
into subspaces invariant with respect to 4. The decomposition reduces the 
studv of the behavior of an operator in the whole space to the study of its 
behavior in the various component subspaces. 

We shall now prove the following theorem: 


§ 2. DECOMPOSITION INTO INVARIANT SUBSPACES 179 


THEOREM 1 (First Theorem on the Decomposition of a Space into Invari- 
ant Subspaces): If for a given operator A the minimal polynomial p(A) 
of the space is represented over ¥ in the form of a product of two co-prime 
polynomials y,(A) and yo(d) (with highest coefficients 1) 


yp (A)= yy (A) Yel), (8) 
then the whole space R splits into two invariant subspaces I, and I, 
R=I,+h, (9) 


whose minimal polynomials are w,(A) and we(A), respectively. 

Proof.We denote by J, the set of all vectors xeR satistying the equation 
y,(A)x =o. I, is similarly defined by the equation y, (A) x =o. J, and 
I, so defined are subspaces of R. 

Since y,(A) and we(A) are co-prime, it follows that there exist polv- 
nomials ¥;(A) and ye(A) (with coefficients in F) such that 


L= py (A) 41 (A) + 2 (A) x2 (A) - (10) 


Now let x be an arbitrary vector of R. In (10) we replace 4 by A and 
we apply both sides of the operator equation so obtained to the vector x: 


x =, (A) 4, (A) * + y,(A) yo(A) x, (11) 
1.€., 
ox + x", (12) 
where 
x’ =e(A)y(A4)%, x” =y,(A) yx, (A) x. (13) 
Furthermore, 
y, (A) x’ =y(A) ¥,(A) x =o, W_(A) x” =y(A) y,(A) x =o, 
1.€., 


x’ eT,and xe I. 


I, and ig have only the null vector in common. For if x ¢ I, and xo € Ih, 
1.€., Y, (A) x,=o and p. (A) Xo— O, then by (11) 


X= X1 (A) V1 (A) x9 + %2 (A) Wo (A) X,=o. 


Thus we have proved that R=h, + I. 


180 VII. Structure or LINEAR OPERATOR IN 1-DIMENSIONAL SPACE 


Now suppose that xe1f,. Then y,(A)x =o. Multiplying both sides of 
this equation by A and reversing the order of A and y,(A), we obtain 
y,(4) Axa, i.e, Axel,. This proves that the subspace I, is invariant 
with respect to A. The invariance of the subspace I. is proved similarly. 

We shall now show that y,(4) is the minimal polynomial of J,;. Let 
y(A) be an arbitrary annihilating polynomial for I,, and x an arbitrary 
vector of R. Using the decomposition (12) already established, we write: 


P1 (A) yy (A) x =p, (A) H(A) x’ + G, (A) yy (A) 2” =O. 


Since x is an arbitrary vector of R, it follows that the product. yw, (A) po(A) 
is an annihilating polynomial for R and is therefore divisible by w(A) = 
y'1(A) we(A) without remainder; in other words, y1(/) is divisible by ¥ (A). 
But (4) is an arbitrary annihilating polynomial for I, and y,(A) is a 
particular one of the annihilating polynomials (by the definition of [,). 
Hence y, (A) is the minimal polynomial of f,._ In exactly the same way it is 
shown that wo(A) is the minimal polynomial for the invariant subspace [g. 
This completes the proof of the theorem. 


Let us decompose y(A) into irreducible factors over F: 
(A) = [91 (A)]* [yg (A)]* ++ Lp, (AT (14) 


(here 9, (A), 2(A), ..., y,(A) are distinct irreducible polynomials over F with 
highest coefficient 1). Then by the theorem we have 


R=h+ht+...+h, (15) 


where I, is an invariant subspace with the minimal polynomial [9,(4)]™ 
We 1 8, on gs8) 

Thus, the theorem reduces the study of the behaviour of a linear operator 
in an arbitrary space to the study of the behaviour of this operator in a 
space where the minimal polynomial is a power of an irreducible polynomial 
over F. We shall take advantage of this to prove the following important 
theorem : 


THEOREM 2: Inavector space there always emsts a vector whose minimal 
polynomial coincides with the minimal polynomial of the whole space. 


We consider first the special case where the minimal polynomial of the 
space R is a power of an irreducible polynomial g(A) : 


v (A) =[p {aj}. 


§ 3. ConGRuUENCE. Factor SPAcE 181 


In R we choose a basis e,, e€,, .-., e,. The minimal polynomial of e; 
is a divisor of y(A) and is therefore representable in the form [p(A)}, where 
k= l (1=1,2,...,m). 

But the minimal polynomial of the space is the least common multiple of 
the minimal polynomials of the basis vectors, so that y(A) is the largest of the 
powers [y(A)]* (c=1,2,...,). In other words, y(A) coincides with the 
minimal polynomial of one of the basis vectors e;, €,,..., e,. 

Turning now to the general case, we prove the following preliminary 
lemma: 


Lemma: If the minimal polynomials of the vectors e’ and e” are co- 
prime, then the minimal polynomial of the sum vector e + e” ts equal to the 
product of the minimal polynomials of the constituent vectors. 


Proof. Let xi(A) and y2(A) be the minimal polynomials of the vectors 
e’ and e’. By assumption, y,(4) and y2(A) are co-prime. Let (A) be an 
arbitrary annihilating polynomial of the vector e=e’+e”. Then 


42(A) 7 (A) e’ = 72(A) ¥(A) e—7(A) x, (A) e”= 0, 


1.e., ¥2(A4)y(4) is an annihilating polynomial of e’. Therefore y2(A) y(A) 
is divisible by y:(A), and since y;(4) and y2(A) are co-prime, x(A) is divisible 
by x1(A). It is proved similarly that y(4) is divisible by y2(4). But y1(A) 
and y2(A) are co-prime. Therefore x(A) is divisible by the product 
1 (A) y2(A). Thus, every annihilating polynomial of the vector e is divisible 
by ¥1(A) y2(4). Therefore y,(A)72(4) is the minimal polynomial of the vector 
e=e'+e". 

We now return to Theorem 2. For the proof in the general case we use 
the decomposition (15). Since the minimal polynomials of the subspaces 
I,, In, ..., I, are powers of irreducible polynomials, our assertion is already 
proved for these subspaces. Therefore there exist vectors e’¢ ],, e’ € In,..., 
e) « I, whose minimal polynomials are [y, (A)]*, [2 (4)]*, ..., [,(A)]*, re- 
spectively. By the lemma, the minimal polynomial of the vector 

=e’+e”’+---+e is equal to the product 


[p, (A)J* [pe (A)]* + +> [p, (A) ]*, 


i.e., to the minimal polynomial of the space R. 


§ 3. Congruence. Factor Space 


1. Suppose given a subspace ICR. We shall say that two vectors x, y of R 
are congruent modulo I and shall write x = y (mod f) if and only if y— xel. 
It is easy to verify that the concept of congruence so introduced has the 
following properties : 


182 VII. STRUCTURE OF LINEAR UPERATOR IN N-JIMBENSIUNAL OraAtc 


For allx,y,2eR 


1. x=x.(mod!I) (reflexivity of congruence). 

2. From x=y (mod /) it follows that y=x (mod I) (symmetry of 
eengruence). 

3. From x =v (mod!) and y =z (mod I) it follows that x = 2 (mod I) 
(transitivity of congruence). 


The presence of these three properties enables us to make use of congru- 
ence to divide all the vectors cf the space into classes. by assigning vectors 
that are pairwise congruent (mod J) to the same class (vectors of distinct 
classes are incongruent (mod I)). The class containing the vector 2 will be 
denoted by x.° The subspace fF is one of these classes, namely 0. Note that 
to every congruence x == y (mod /) there corresponds the equality® of the 
associated classes: x = ¥. 

It is elenientary to prove that congruences may be added term by term 
and multiphed by a number of F: 

1. From 

xx’ andy=y (mod F) 
it follows that 
xty=ec2'ty' (mod f). 
2. From 
x=’ (mod f) 
it follows that , 
ax = ax’ (mod I) (aeF). 


These properties of congruence show that the operations of addition anc 
multiplication by a number of F do not ‘break up’ the classes. If we take twe 
classes ¥ and ¥ and add elements x, x’,... of the first class to arbitrary ele. 
ments y, y’,... of the second class, then all the sums so obtained belong to one 
and the same class, which we call the sum of the classes ¥ ana y and denote 
by x+y¥. Similarly, if all the vectors x, x’,...of the class x are multipliec 
by a number ae F, then the products belong to one class, which we denot: 
by ax. 

Thus, in the manifold R of all classes X, ¥, ... two operations are intro 
duced: ‘addition’ and ‘multiplication by a number of F.’ It is easy t 
verify that these operations have the properties set forth in the definitio: 
of a vector space (Chapter III, §1). Therefore R. as well as R, is a vecto 


5 Since each class contains an infinite set of vectors, there is, by this condition, a 
infinite number of ways of designating the class. 


6 That is, identity. 


§ 3. ConGRUENCE. Factor Space 183 


space over the field . We shall say that R is a factor space of R. If n,m, % 
are the dimensions of the spaces R, I, R, respectively, then 7 =n — m. 


2. All the concepts introduced in this section can be illustrated very well by 
the following example. 


Example. Let R be the set of all vectors 
+¥ of a three-dimensional space and F the field 
of real numbers. For greater clarity, we 
shall represent vectors in the form of directed 
segments beginning at a point O. Let Ibea 
straight line passing through O (more accu- 
rately: the set of vectors that lie along some 

: line passing through O; Fig. 4.) 
The congruence x = x’ (mod /) signifies 
that the vectors x and x’ differ by a vector. of 
I, 1.e., the segment containing the end-points 


x’ 
[~~ | x+y of x and x’ is parallel to I. Therefore the 
y ae class # is represented by the hne passing 


ong 
e) 
ce) 
as 


Le through the end-point of x and parallel to J 
y (more accurately : by the ‘bundle’ of vectors 
starting from O whose end-points lie on that 
line). ‘Bundles’ may be added and multi- 
plied by a real number (by adding and multi- 
plying the vectors that occur in the bundles). 
These ‘bundles’ are also the elements of the 
factor space R. In this example, n=3, 
m=1, n= 2. 
We obtain another example by taking for 
Fig. 4 Ia plane passing through O.: In this example, 
n=3,m=2,n=1. 

Now let A be alinear operator in R. Let us assume that Its an invariant 
subspace with respect to A. The reader will easily prove that from x =z’ 
(mod If) it follows that Ax = Ax’ (mod I), so that the operator A can be 
applied to both sides of a congruence. In other words, if the operator A is 
applied to all vectors x, x’,... of a class ¥, then the vectors Ax, Ax’,... also 
belong to one class, which we denote by Ax. The linear operator A carries 
classes into classes and is, thus, a linear operator in R. 


We shall say that the vectors x,, %,..., x, are linearly dependent 
modulo I if there exist numbers 4@,, @,...,@p in F, not all equal to zero, 
such that 


0%, + ag, +++++0,%,220 (mod I). (16) 


184 VIT. Srructure or LINEAR OPERATOR IN 2-DIMENSIONAL SPACE 


Note that not only the concept of linear dependence of vectors, but also 
all the concepts, statements, ail reasenings, in the preceding sections of this 
chapter can be repeated word for word with the symbol! ‘=’ replaced through- 
out by the symbol ‘= (mod I),’ where I is some fixed subspace invariant 
with respect to A. 

Thus, we can introduce the concepts of an annihilating polynomial and 
of the minimal polynomial of a vector or a space (mod J). All these con- 
cepts will be called ‘relative,’ in contrast to the ‘absolute’ concepts that were 
introduced earlier (and that held for the symbol ‘=’). 

The reader should observe that the relative minimal polynomial (of a vec- 
tor or aspace) 1s a dwisor of the absolute one. For example, let o1(4) be the 
relative minimal polynomial of a vector x and o(A) the corresponding abso- 
. lute minimal polynomial. : 

Then ‘ 
a(A)x=0o, 


and hence it follows that also 
o(A)x=o (mod!). 


Therefore o(A) is a relative annihilating polynomial of x and as suct 
is divisible by the relative minimal polynomial o,(A). 

Side by side with the ‘absolute’ statements of the preceding sections we 
have ‘relative’ statements. For example, we have the statement : ‘In every 
space there always exists a vector whose relative minimal polynomial coin 
cides with the relative minimal polynomial of the whole space.’ 

The truth of all ‘relative’ statements depends on the fact'that by operat 
ing with congruences modulo If we deal essentially with equalities—howeve 
not in the space R, but in the space R. 


§ 4. Decomposition of a Space into Cyclic Invariant Subspaces 


1. Let o(A)= a? + a, AP i++ +++a,_,4+4+ @, be the minimal polynomial o 
a vector e. Then the vectors 


e, de,..., A’-*e (1 
are linearly independent, and 


Ave =— a6 — @_,Ae—-++-—@, Ae, (1 


§ 4. DECOMPOSITION oF SPACE INTO CycuLic INVARIANT SuBSPACES 185 


The vectors (17) form a basis of a p-dimensional subspace I. We shall 
call this subspace cyclic in view of the special character of the basis (17) 
and of (18).7. The operator A carries the first vector of (17) into the second, 
the second into the third, ete. The last basis vector is carried by A into a 
linear combination of the basis vectors in accordance with (18). Thus, A 
carries every basis vector into a vector of I and hence an arbitrary vector of I 
into another vector of I. In other words, acyclic subspace 1s always invariant 
with respect to A. 

Every vector x e I is representable in the form of a linear combination of 
the basis'vectors (17), i.e.. in the form 


x =y7(A)e,- (19) 


where y(A) is a polynomial in A of degree S p — 1 with coefficients in F. 
By forming all possible polynomials y(4) of degree = p—1 with coeffi- 
cients in F we obtain all the vectors of I, each once only, i., for only one 
polynomial y(A). In view of the basis (17) or the formula (19) we shall 
say that the vector e generates the subspace. 


Note that the minimal polynomial of the generating vector e 1s also the 
minimal polynomial of the whole subspace I. 


2. Weare now ready to establish the fundamental proposition of the whole 
theory, according to which the space R splits into cyclic subspaces. 

Let yp, (A) = y(4) = A" + a, A™1 4+ --++ a, be the minimal polynomial of 
the space R. Then there exists a vector e in the space for which this poly- 
nomial is minimal (Theorem 2, p. 180). Let I, denote the cyclic subspace 
with the basis 


e, Ae,..., A” e. (20) 
If n=™m, then R=I,. Suppose that n > m and that the polynomial 
Yo (A)= AP t+ BaP ht ++ + By 


is the minimal polynomial of R (mod /,). By the remark at the end of § 3, 
wyo(A) is a divisor of y1 (A), i.e., there exists a polynomial x(/) such that 


py (A)=ye(A)x(A).. | (21) 


7It would be more accurate to call this subspace: cyclic with respect to the linear 
operator A. But since the whole theory is built up with reference to a single operator A, 
the words ‘with respect to the linear operator A’ are omitted for the sake of brevity (see 
the similar remark in footnote 2, p. 176). 


186 VIJ. Structure or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


Moreover, in & there exists a vector g* whose relative minimal polynomial 
is ye(A). Then 


¥(A)g*=o (mod), (22) 
i.e., there exists a polynomial y(4) of degree = m — 1 such that 
¥3(A) g*=y(A)e. (23) 


We apply the operator’*(A) to both sides of the equation. Then by (21) 
we obtain on the left y,(A)g*, i.e. zero, because yi (4) is the absolute minimal 
polynomial of the space; therefore 


4(A)y(Aje=o. 
This equation shows that the product x (A) y (4) is an annihilating poly- 


nomial of the vector e and is therefore divisible by the minimal polynomial 
wy, (A) = 2 (A) yy (A), so that ¥(A) is divistble by y(A) : 


x (A)= 1 (A) 2 (A), (24) 


where x,(A) is a polynomial. Using this decomposition of y(4), we may 
rewrite (23) as follows: 


¥,(A) [g* — x, (A) e]=o. (25): 
We now introduce the vector 
B= Bt —x,(A)e. (26) 


Then (25) can be written as follows: 
y,(A) g=e. (27) 


The last equation shows that yo(A) is an absolute annihilating polynomial 
of the vector g and is therefore divisible by the absolute minimal polynomial 
of g. On the other hand, we have from (26) : 


g=eg* (modi,). (28) 


Hence w.(A), being the relative minima! polynomial of g*, is the same for 
g aswell. Comparing the last two statements, we deduce that po(A) is simul- 
taneously the relative and the absolute minimal polynomial of g. 

From the fact that we(A) is the absolute minimal polynomial of g it 
follows that the subspace I, with the basis 


g, Ag,..., A? 'g (29) 


is cyclic. 


§ 4. DECOMPOSITION OF SPACE INTO CycLic INVARIANT SUBSPACES 187 


From the fact that yo() is the relative minimal polynomial of g (mod 1) 
it follows that the vectors (29) are linearly independent (mod I,), i., no 
linear combination with coefficients not all zero can be equal to a linear 
combination of the vectors (20). Since the latter are themselves linearly 
independent, our last statement asserts the linear independence of the m + p 
vectors 


e, de,..., A” e; g, Ag,...; A’ g. (30) 


The vectors (30) form a basis of the invariant subspace I, + I, of dimen- 
sion m + p. : 

If n=m+p, then R=I,+1.. If n>m+p, we consider R (mod 
I, +I.) and continue our process of separating cyclic subspaces. Since the 
whole space R is of finite dimension n, this process must come to an end 
with some subspace J,, where t = n. 

We have arrived at the following theorem: 


THEOREM 3 (Second Theorem on the Decomposition of a Space into 
Invariant Subspaces): Relative to a given linear operator A the space can 
always be split into cyclic subspaces I,, Io, ..., 1, with mimmal polynomials 
¥1(A), Ws(A), --- 1% (A) 


R=-ht+h+...+h, (31) 


such that y,(A) coincides with the minimal polynomial w(A) of the whole 
space amd that each y,(A) 18 divisible by w_i(A) (= 2,3,...,¢). 


3. We now mention some properties of cyclic spaces. Let R be a cyclic 
n-dimensional space and y(A4) =A" +... its minimal polynomial. Then it 
follows from the definition of a cyclic space that m=n. Conversely, sup- 
pose that R is an arbitrary space and that it is known that m=n. Applying 
the proof of the decomposition theorem, we represent R in the form (31). 
But the dimension of the cyclic subspace I, is m, because its minimal poly- 
nomial coincides with the minimal polynomial of the whole space. Since 
m =n by assumption, we have R = |, i.e., R is a cyclic space. 

Thus we have established the following criterion for cyclicity of a space: 


THEOREM 4: A space 1s cyclic tf and only if its dimension is equal to the 
degree of its minimal polynomial. 

Next, suppose that we have a decomposition of a cyclic space R into two 
invariant subspaces I, and I: 


188 VII. Structure or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


We denote the dimensions of R, I,, and I, by n, 71, and n2, their minimal 
polynomials by y(A), yi(4), and ye(A), and the cegrees of these minimal 
polynomials by m, mi, and mz, respectively. Then 


MS, Me ZN. (33° 

We add these inequalities term by term: 
my + m2zSm + N2- (34 
Since w(A) is the least common multiple of y,(A) and we(A), we have 
) msm, +m. (35 


Moreover, it follows trom (32) that 
n =n, + Ne ° (36 


(34), (35), and (36) give us a chain of relations 
Mom t+m, Sn t+n=Nn. (3% 


But since the space R is cyclic, the extreme numbers of this chain, m anc 
m, are equal. Therefore we have equality in the middle terms, i.e., 


m= mM, + Me= Ny + Ne. 


From the fact that m= m, + me we deduce that y,(A) and yo(A) ar 
co-prime. 
Bearing (33) in-mind, we find from m, + mp = n, + ng that 


Mm—=—n, and mo= ne. (38 


These equations mean that the subspaces I, and [2 are cyclic. 
Thus we have arrived at the following proposition : 


THEOREM 5: A cyclic space can only split into invariant subspaces the 
1. are also cyclic and 2. have co-prime minimal polynomuals. 


The same arguments (in the opposite order) show that Theorem 5 has 
converse : 


THEOREM 6: If a space is split into invariant subspaces that 1. are cycl 
and 2. have co-prime minimal polynomials, then the space itself 1s cyclic. 

Suppose now that R is a cyclic space and that its minimal polynomial 
a power of an irreducible polynomial over F: y (A) = [p (A)]*. In this eas 
the minimal polynomial of every invariant subspace of R must also be 
power of this irreducible polynomial g:(A). Therefore the minimal pol 
nomials of any two invariant subspaces cannot be co-prime. But then, | 
what we have proved, R cannot split into invariant subspaces. 


§ 4. DecomPosiTION or SPACE INTO Cyciic INVARIANT SUBSPACES 189 


Suppose, conversely, that some space R is known not to split into invari- 
ant subspaces. Then R is a cyclic space, for otherwise, by the second decom- 
position theorem, it could be split into cyclic subspaces ; moreover, the mini- 
mal polynomial of R must be a power of an irreducible polynomial, because 
otherwise R could be split into invariant subspaces, by the first decomposi- 
tion theorem. 

Thus we have reached the following conclusion : 


THEOREM 7: A space does not split into invariant subspaces 1f and only 
if 1. it ts cyclic amd 2. tts minimal polynomial is a power of an irreducible 
polynomial over F. 

We now return to the decomposition (31) and split the minimal poly- 


nomials y,(A), we(A), ..., we(A) of the cyclic subspaces f,, Io, ..., I, into 
irreducible factors over F: j 


1 (A) = [91 (A) [99 (A)}* +++ Ly, (A) }*, 
Yq (A) = [91 (A) I [G9 (A)]* + > Lp, (A) ]**, 


Geile ya ha ae an a oe (39) 
Pe (A) =[9, (A) [ps (A) --L9, (y)T* 
(yj 2d2...2,20; k=1,,2,...,8). 
To I, we apply the first decomposition theorem. ‘Then we obtain 
Hn! + +e + I; 
where I’, 1,” , ..., Tf are cyclic subspaces with the minimal polynomials 


(p, (A) ]*, [99 (4) }*, ..., [e(A)]*. Similarly we decompose the spaces Jo, ..., 
I,. In this way we obtain a decomposition of the whole space R into cyclic 
subspaces with the minimal polynomials [g, (2)J™, [p, (A) ]**, -.-, [@, (A)]# 
(k=1,2,...,s). (Here we neglect the powers whose exponents are zero. ) 
From Theorem 7 it follows that these cyclic subspaces are indecomposable 
(into invariant subspaces). We have thus arrived at the following theorem : 


THEOREM.8 (Third Theorem on the Decomposition of a Space into Invari- 
ant Subspaces): A space can always be split into cyclic invariant subspaces 


R=P+’+---+1™, (40) 


such that the manimal polynomual of each of these cyclic subspaces 1s a power 
of an irreducible polynomial. 


' This theorem gives the decomposition of a space into indecomposable 
invariant subspaces. 


8 Some of the exponents dx,..., lx for k > 1 may be equal to zero. 


190 VII. Structure or LINEaR OPERATOR IN n-DIMENSIONAL SPACE 


Note. Theorem 8 (the third decomposition theorem) has been proved 
by applying the first two decomposition theorems. But it can also be 
obtained by other means, namely, as an immediate (and almost trivial) 
corollary of Theorem 7. 

For if the space R splits at all, then it can always be split into indecom- 
posable invariant subspaces : 


R=P+P+...4+1, (40) 


By Theorem 7, each of the constituent subspaces is cyclic and has as its 
minimal polynomial a power of an irreducible polynomial! over F. 


§ 5. The Normal Form of a Matrix 


l. Let I, be an m-dimensional invariant subspace of R. In I, we take an 
arbitrary basis e,, e,,...,e,, and complement it to a basis 


ey, €,, eee y Cm? Cnt» eoety e, 


of R. Let us see what the matrix A of the operator A looks like in this basis. 
We remind the reader that the k-th column of A consists of the coordinates 
of the vector Ae, (kK=1,2,...,n). Fork =m the vector Ae, € 1, (by the 
invariance of [,) and the last » — m coordinates of Ae, are zero. Therefore 
A has the following form 


Ay “Aa | eo » (41) 


}n—m 


where A, and A, are‘square matrices of orders m and n — m, respectively. 
and A; is a rectangular matrix. The fact that the fourth ‘block’ is zerc 
expresses the invariance of the subspace [,. The matrix A, gives the 
operator A in J, (with respect to the basis e;, e,, ..., e,). 

Let us assume now that e,,,,, ...,e, 1s the basis of some invariant sub. 
space I., so that R= 1, + I, and a basis of the whole space is formed fror 
the two parts that are the bases of the invariant subspaces f, and Iz. Ther 
obviously the block A; in (41) is also equal to zero and the matrix A has the 
quasi-diagonal form 


§ 5. NormaL Form or a Matrix 191 


A, O 


4=| 04, | =t4v 4a) (42) 


where A, and Ag are, respectively, square matrices of orders m and n—m 
which give the operator in the subspaces I, and Iz (with respect to the bases 
C;, &,,---,e, ande,,,,...,e,). It is not difficult to see that, conversely, 
to a quasi-diagonal form of the matrix there always corresponds a decomposi- 
tion of the space into invariant subspaces (and the basis of the whole space 
is formed from the bases of these subspaces). 


2. By the second decomposition theorem, we can split the whole space R 
into eyclic subspaces J, Io, .:., T,: 


In the sequence of minimal polynomials of these subspaces ,(A), Y4(4), ..., 
y,(A) each factor is a divisor of the proceeding one (from which it follows 
automatically that the first polynomial is the minimal polynomial of the 
whole space). 


Let 
wy, (A) =A*® + a AM! 4 02+ + a, 
=)? P-l it ae 
re ee el uzpz.zo. 
y, (A) =A + edt toot 8. 
We denote by e, g,..., i generating vectors of the subspaces I, fo, ..., I: 


and we form a basis of the whole space from the following bases of the cyclic 
subspaces : 


e, de,..., A™'!e;g, Ag,..., A? ig; ...;1, Al, ..., Av. (45) 
Let us see what the matrix Z, corresponding to A in this basis looks hke. 


As we have explained at the beginning of this section, the matrix L; must 
have quasi-diagonal form 


L, 0... 0 
... O 

b=(O "|, (46) 
0 0... L 


The matrix D, corresponds to the operator A in I, with respect to the basis 
e,= €, C2 = Ae, ...,€,= A™ Ie. By applying the rule for the formation 


192 VII. StructTURE oF LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


of the matrix for a given operator in a given basis (Chapter III, p. 67), we 
find 


ne 0. 0 —ap 
(2 OO... OO —ap_y 
+ 0 1 5 
L,= nw oF oy (47) 
0 _«, | 
00...1—a 
Similarly 
i 0. 0 —8B, 
OL 
i= | : (48) 
| fe. ee | 
lo oO... 1 —A 
Computing the characteristic polynomials of the matrices D,, Le, ..., Lt, 
we find: 


| AE —L, |=, (4), |AB—L, |=, (a), .-., |AB— L,|=y, (A) 


(for cyclic subspaces the characteristic polynomial of an operator A coin- 
cides with the minimal polynomial of the subspace relative to this operator ). 

The matrix LZ; corresponds to the operator A in the ‘canonical’ basis (45). 
If A is the matrix corresponding to A in an arbitrary basis, then A is similar 
to Ly, i.e., there exists a non-singular matrix T such that 


A=THT. (49) 


Of the matrix L; we shall say that it has the first natural normal form. 
This form is characterized by 

1) The quasi-diagonal form ; 

2) The special structure of the diagonal blocks (47), (48), ete. 

3) The additional condition : the characteristic polynomial of each diago- 
nal block is divisible by the characteristic polynomial of the following 
block. 

If we start not from the second, but from the third decomposition theorem, 
then in exactly the same way we would obtain a matrix LZ, corresponding to 
the operator A in the appropriate basis—a matrix having the second natural 
normal form, which is characterized by 

1) The quasi-diagonal form 


In= {L®, L®, ..., Li} , 


§ 6. Invariant PoLyNomiALs. ELEMENTARY Drvisors 193 


2) The special structure of the diagonal blocks (47), (48), ete.; 
3) The additional condition: the characteristic polynomial of each block 
is a power of an irreducible polynomial over F. 


3. In the following section we shall show that in the class of similar matrices 
corresponding to one and the same operator there is one and only one matrix 
having the first normal form,? and one and only one *° having the second 
normal form. Moreover, we shall give an algorithm for the computation 
of the polynomials yp,(A), w.(A),.-..¥y,(A) from the elements of the matrix A. 
Knowledge of these polynomials enables us to write out all the elements of the 
matrices LZ; and Ly similar to A and having the first and second natural 
normal forms, respectively. 


§ 6. Invariant Polynomials. Elementary Divisors 


1. Wet! denote by D,(A) the greatest common divisor of all the minors of 
order p of the characteristic matrix A, =AEK — A (p=1,2,...,n).2* Since 


in the sequence 
D(A), Dy_y (A), «--, Dy (A). 


each polynomial is divisible by the following, the formulas 


Sn aa A ; Dy_1 (4 ; D, (a = 
ig A Bo ta (A GPE, iA = BAG} (Do(A=L) (60) 


define n polynomials whose product is equal to the characteristic polynomial 
A (A) =|AE —A|=D,, (a) =4, (A) ig (A) ++i, (A). (51) 


We split the polynomials 7,(4) (p =1, 2,...,) into irreducibie factors 
over F: 
i, (A)= [9 (A)? [pe (A)? +++ (p=1,2,..., 2); (52) 


where 9, (A), 2(A),... are distinct irreducible polynomials over F. 


9 This does not mean’ that there exists only one canonical basis of the form (45). 
There may be many caponice bases, but to all of them there corresponds one and the 
same matrix Ly. 

10 To within the order of the diagonal blocks. 

11 In subsection 1. of the present section we repeat the basic concepts of Chapter VI, 
§ 3 for the characteristic mats that were there established for an arbitrary polynomial 
matrix. 

12 We always take the highest coefficient of the greatest common divisor as 1. 


194. VEL. Structure or LINEAR OrpeRaAtor IN n-DIMENSIONAL SPack 


The polynomials 1,(4), w2(A). ..., t1(A) are called the mrariant poly. 


’ 


nomials, and all the non-constant powers among [g,(A)]’?, [p2(A)}?. ... are 
ealled the elementary divisors, of the characteristic matrix A, = AF — A or, 


simply, of A. 

The product of all the elementary divisors, like the product of all the 
invariant polynomials, is equal to the characteristic polynomial .1(4) = 
| AB —A |. 

The name ‘invariant polynomial’ is justified by the fact that two sunilar 
matrices A and A, 


A=T- AT, (53) 
always have identical invariant polynomials 
i, (4)= 4, (a) (p=1, 2, ...,). (54) 
For it follows from (53) that 
A,=AR—A=T(AE—A)T=TA,T. (55) 


Hence (see Chapter I, § 2) we obtain a relation between the minors of the 


similar matrices A, and Ag: 


a,(} re _ 
k, k,...k, 
te Be ioe art A, By... B, B..--B, 
= i is "\ 4, : "\r( (56) 
ee Ay ao eee Zy By By--- By k, k, eee k, 
a a (p= 1,2,..., 2). 


This equation shows that every common divisor of all the minors of order 
p of A) is a common divisor of all the minors of order p of Ay, and vice versa 

‘(sinee A and A ean interchange places). Hence it follows that D, (4) = D,(A) 
(p=1,2,...,) and that (54) holds. 

Since all the matrices representing a given operator A in various bases 

are similar and therefore have the same invariant polynomials and the same 
elementary divisors, we can speak of the invariant polynomials and the 
elementary divisors of an operator A. 
2. We choose now for A the matrix L; having the first natural normal form 
and we compute the invariant polynomials of A starting from the form of 
the matrix 4,=AE—A (in (57) this matrix is written out for the case 
m=5, p=4, g=4, r=3): 


§ 6. INVARIANT POLYNOMIALS. ELEMENTARY DIvisoRS 195 


A000 04 0000 0000 +00 0 
—~l 4 00m 0000 #000060 00 0 
O-1 4 0 @e 0060 00 #6000060 00 0 
0 0-1 4a: 0000 0000 00 0 
9 0 O—lat+’4i0000 0000 +00 0 
et a ea ae 
e000 0-1 40 ~ : 0 000 090 0 
7 0 00 0 O—l 2 8B, 000 0 0 0 0 (57) 
0 0.0 0 0: 0 0-1%,44' 0 0 0 0 0 0 0 
Oe ae ee 
00000 Q9 0 0 0 j-1 £0 »% : 0 0 0” 
00000 #600 00: 0-1 4»: 0 0 0 
00 0 0 0 0 0 0 0 1 0 O-1y+4: 0 0 0. 
00000 0000 000 0:4 0 ¢ 
000 0 0 0 0 0 0 000 0 il A es 
00000 #00 0 0 #0 0 0 0: O—1 e442 


Using Laplace’s Theorem, we find 
D, ()=|A#—A| =|AB—L,| |AB—L,|-+-|AB—L,|= vad) ya (Ao, (A). (68) 


Now let us find D,_;(A). We consider the minor of the element am. 
This minor, apart from a factor + 1, is equal to 


|AB— Le 


AE — L,|= yg (A) +++ y, (A). (59) 


We shall show that this minor of order n — 1] is a divisor of all the other 
minors of order n — 1, so that 


Dy_y(A)= Yo (A) +++ Y, (A). (60) 


For this purpose we first take the minor of an element outside the diago- 
nal blocks and show that if vanishes. To obtain this minor we have to 
suppress one row and one column in the matrix (57). The lines crossed out 
in this case intersect two distinct diagonal blocks, so that in each of these 
blocks only one line is crossed out. Suppose, for example, that in the j-th 
diagonal block one of the rows is crossed out. In the minor we take that 
vertical strip which contains this diagonal block. In this strip, which has s 
columns, all the rows except s — 1 rows consist entirely of zeros (we have 
denoted the order of A; by s). Expanding the determinant of order n— 1 
by Laplace’s Theorem with respect to the minors of order s in this strip, we 
see that it is equal to zero. 


196 VII. Structures or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


Now we take the minor of an element inside one of the diagonal blocks. 
In this case the lines crossed out ‘mutilate’ only one of the diagonal blocks, 
say the j-th, and the matrix of the minor is again quasi-diagonal. Therefore 
the minor is equal to 


(A) -+* Ppa (A) Pjur (A) = 9 (A) x (A), (61) 
where ¥(A) is the determinant of the ‘mutilated’ j-th diagonal block. Since 
y4(A) is divisible by y444(4) (¢=1,2....,£— 1), the product (61) is divis- 
ible by (59). Thus, equation (60) can be regarded as proved. By similar 
arguments we obtain: 


Baas Bai he (62) 
Date (A= pA). 
D,_ (A= = D(A = 1. 
From (58), (60), and (62) we find: 
D,(4) _. wo Dest Acs 
vi 2 =a), ve = BG) TAA. 
Dpy—t+1(A) _. - 
y, (2) = past l i,(A), (63) 
24.1 (A) 26 =a, (A) Hl 
The formulas (63) show that the polynomials », (A), P2(A), ..., p,(4) com- 


cide with the invariant polynomials, other than 1, of the operator A (or the 
corresponding matrix A). ) 
Let us give three equivalent formulations of the results obtained : 


THEOREM 9 (More precise form of the Second Decomposition Theorem) : 
If A ts a linear operator in R, then the spacc R can be decomposed into cyclic 
subspaces 
R=h+ht+...t+h 


such that in the sequence of minimal polynomials yp, (4), Pol), --.. y, (A) of 
the subspaces 1,, In, ..., I, each is divisible by the following. The poly- 
nomials y, (A), Yo(A), ..., y,(A) are uniquely determined : they coincide with 
the invariant pelynomials, other than 1, of the operator A. 


THEOREM 9’: For every linear operator A in R there c.xrists a basis in 
which the matrix L, that gives the operator ts of the farst natural normal 
form. This matrix 1s uniquely determined when the operator A is given: 
the characteristic Nr eaiaal of the Grane! blocks of Ly are the enuaan 
polynomials of A. 


§ 6. INVARIANT POLYNOMIALS. ELEMENTARY DIVISORS 197 


THEOREM 9”: In every class of sumilar matrices (with elements in F) there 
exists one and only one matrix Ly having the first natural normal form. The 
characteristic polynomials of the diagonal blocks of Ly coincide with the 
envariant polynomials (other than 1) of every matrix of that class. 

; On p. 194 we established that two similar matrices have the same invariant 
‘polynomials. Now suppose, conversely, that two matrices A and B with 
elements in F are known to have the same invariant polynomials. Since. the 
matrix L; is uniquely determined when these polynomials are given, the two 
matrices A and B are similar to one and the same matrix L, and, therefore, 
to each other. We thus arrive at the following proposition : 


THEoREM 10: Two matrices with elements in F are similar tf and only tf 
they have the same invariant polynomials. 


3. The characteristic polynomial 4(A) of the operator A coincides with 
D,,(A), and hence with the product of all invariant polynomials: 


A (A) = yy (A) H(A) +++ Y,(A)- (64) 


But y,(A) is the minimal polynomial of the whole space with respeet to 
A; hence y,(A) =O and by (64) 


A(A) =O. (65) 


Thus we have incidentally obtained the Hamilton-Cayley Theorem (see 
Chapter IV, § 4): 
Every linear operator (every square matrix) satisfies tts characteristic 
equation: 
In § 4 by splitting the polynomials y,(A), yo(A), ..., y,(A) into irreducible 
factors over F: 
1 (A)= [9 (A)T* [ee (A)T* ++ [e (aT, 
Wo (A)= [93 (A)I" [pa (AI* «++ 19, (ANT, (" 24,2°:-2h, 
isis olny ade te Orta Bok Ss tae: eee k=1,2,...,8 


(A) = [1 (A)]" [9 (A)I* + +> 19. (AT? 
we were led to the third decomposition theorem. To each power with non- 
zero exponent on the right-hand sides of (66) there corresponds an invariant 
subspace in this decomposition. . | 
By (63) all the powers, other than 1, among [9,(4)]*, ..., [p,(A)]}*# (k= 
1, 2,..-,8) are the elementary divisors of A (or A) in the field F (see p. 194). 
Thus we arrive at the following more precise statement of the third 
decomposition theorem : 


} (66) 


13 Or (what is the same) the same elementary divisors in the field F. 


198 VIT. Structure or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


THEOREM 11: Jf Ais alinear operator ina vector space R over a field F, 
then R can be split into cyclec subspaces whose minimal polynomials are the 
elementary divisors of A in F. 


Let 
R="+h+...4+1™ (67) 


be such a decomposition. We denote by e’, e”’,..., e™ generating vectors 
of the subspaces I’, I’, ..., 1 and from the ‘cyclic’ bases of these subspaces 
we form a basis of the whole space 


e’, Ae’,...; e”, Ae”, ...; e@, Ae™, .... (68) 


? 


It is easy to see that the matrix Ly, corresponding to the operator A in the 
basis (68) has quasi-diagonal form, like L;: ‘ 


Lu =(Z,, L,, ree one (69) 


* The diagonal blocks L,, Le, ..., Ly, are of the same structure as the blocks 
(47) and (48) of Ly. However, the characteristic polynomials of these 
diagonal blocks are not the invariant polynomials, but the elementary divisors 
of A. The matrix Ly, has the second natural normal form (see § 5). 

We have arrived at another formulation of Theorem 11: 


THEOREM 11’: For every linear operator Ain R (over the field F) there 
exists a basis in which the matrix Ly giving the operator is of the second 
natural normal form; the characteristic polynomials of the dvagonal blocks 
are the elementary divisors of A in ¥. 


This theorem also admits a formulation in terms of matrices: 

THEOREM 11”: A matrix A with elements in the field F 1s always similar 
to a matrix Ly having the second natural normal form in which the charac- 
teristic polynomials of the diagonal blocks are the elementary divisors of A. 

Theorem 11 and the associated Theorems 11’ and 11” have, in a certain 
sense, a converse. 

Let 

R=F4+M’+...4 1: 


be an arbitrary decomposition of a space R into indecomposable invariant 
subspaces. Then by Theorem 7 the subspaces I’, I”,..., 1™are cyclic and 
their minimal polynomials are powers of irreducible polynomials over F. 
We may write these powers, after adding powers with zero exponent if nec- 
essary, in the form’ 


' 14 At least one of the numbers h, lL,..., 1, i3 positive. 


§ 6. INvartanT POLYNOMIALS. ELEMENTARY DIVISORS 199 


[p1(A)]*, [p,(A)]*, .--, Lp, (aT*, 
[p, (A)]", [go (A)]*, .... [e (AT, . 24,2---21,20 i (70) 
| — 


[or (A)T", (p29 (AT, .. «5 Lp, (ay. 


We denote the sum of the subspaces whose minimal polynomials are in 
the first row by [,. Similarly, we introduce Iz, ..., I, (¢ is the number of 
rows in (70)). By Theorem 6, the subspaces [,, Io,... , I, are cyclic and their 
minimal polynomials wy, (A), p2(4), -.., p,(a) are determined by the formulas 
(66). Here in the sequence y, (J), Y2(A), ..-, y,(A) each polynomial is divis- 
ible by the following. But then Theorem 9 is immediately applicable to the 
decomposition 

R=[,+h+...+1,. 
By this theorem 
py, (A)=1, (A) (p=1,2,...,m), 


and therefore, by (66), all the powers (70) with non-zero exponent are the 
elementary divisors of A in the field F. Thus we have the following theorem : 


THEOREM 12: If the vector space R (over the field F) ts split in any way 
into decomposable invariant subspaces (with respect to an operator A), then 
the minimal polynomials of these subspaces are all the elementary divisors 
of AimF. 

There is an equivalent formulation in terms of matrices: 

THEOREM 12’: In each class of similar matrices (with elements in F) 
there exists only one matrix (to within the order of the diagonal blocks) 
having the second normal form Ly; the characteristic polynomials of its 
diagonal blocks are the elementary divisors of every matrix of the given class. 


Suppose that the space R is split into two invariant subspaces (with 
respect to an operator A) 


R= I, + Ih. 
When we split J, and I, into indecomposable subspaces, we obtain at the same 


time a decomposition of the whole space R into indecomposable subspaces. 
Hence, bearing Theorem 12 in mind, we obtain: 


THEOREM 13: If the space Ris split into invariant subspaces with respect 
to an operator A, then the elementary divisors of A in each of these invariant 
subspaces, taken wn their totality, form a complete system of elementary 
divisors of Ain R. 


This theorem has the following matrix form: 


900 VII. Structure or LINEAR OPERATOR IN 9-DIMENSIONAL SPACE 


THEOREM 13’: A complete system of clementary divisurs in F of a quast- 
diagonal matrix is obtained as the union of the elementary divisors of the 
diagonal blocks. 

Theoren 13’ is often used for the actual process of finding the eleinentary 
divisors of a matrix. 


§ 7. The Jordan Normal Form of a Matrix 


1. Suppose that all the roots of the characteristic polynomial A(/) of an 
operator A belong to the field F. This will hold true, in particular, if F is the 
field of all complex numbers. 

In this case, the decomposition of the invariant polynomials into ele- 
mentary divisors in F will look as follows: 


i, (A) =(A—Ay)* (A—Ag)* 2+ (A—A,)* 
a= a Ay)" (A— Ag) +++ (AA) beat ‘214,20, 
; ° c¢, >0; k=1, 2, 


i, (A)=(A— a (A— “ayy + (A—A,)". 


Since the product of all the invariant polynomials is equal to the character- 
istic polynomial .1(4), 41, de... .. Ag in (71) are all the distinet roots of 4(A). 
We take an arbitrary elementary divisor 


(A — do)? ; (72) 


here A, is one of the numbers 4,. ds, .... Ag and p is one of the (non-zero) 
exponents cx, dy, ..., dy (k= 1, 2,...,8). 

To this elementary divisor there corresponds in (67) a definite cyclic 
subspace I, generated by a vector which we denote by e. For this vector 
(A — A,)? is the minimal polynomial. 

We consider the vectors 

=(A—A,E)?-1e, e, =(A—A,E)?-*e, ..., e,=e. (73) 


Pp 


The vectors e;, @2,.... e, are linearly independent, since otherwise there 


would be an annihilating polynomial for e of degree less than p, which is 
impossible. Now we note that 


(A—A,E)e,=0, (A—A,E)e,—e,, ..., (A—A E) e, =e (74) 


p-1 
or 
Ae, =Age,, Ae,= Agen + 1, ..., Ae, =Age, + ep-1- (75) 


§ 7. JorDan NormaL Form ‘201 


With the help of (75) we can easily write down the matrix corresponding 
to A in I for the basis (73). This matrix looks as follows: 


A 190... 90 

0 Ay 1 e ee 0 e 

° = 4, EB?) + He), (76) 
: = _ 

000... A 
where £®) is the unit matrix of order p and H) the matrix of order p which 
has 1’s along the first superdiagonal and 0’s everywhere else. 

Linearly independent vectors e;, €2,..., ep, for which (75) holds form a 
so-called Jordan chain of vectors in I. From Jordan chains connected with 
each subspace I’, I’, ..., I we form a Jordan basis of R. If we now denote 
the minimal polynomials of these subspaces, i.e., the elementary divisors of 
A, by 

(A—Ay)", (A—Ag)”, 02 (A—A,)™ (77) 


(the numbers 4, 42, ... , 4, need not all be distinct), then the matrix J corre- 
sponding to A in a Jordan basis has the following quasi-diagonal form: 


J= { ry gaa a HH? A,B as HP) | saad Oe i ais HH? } (78) 


We shall say of ‘the matrix J that it is of Jordan normal form or simply 
Jordam form. The matrix J can be written down at once when the elemen- 
tary divisors of A in the field F containing all the characteristic roots of the 
equation 4(4) =0 are known. 

Every matrix A is similar to a matriz J of Jordan normal form, i.e., for 
an arbitrary matrix A there always exists a non-singular matrix T(| T | ~0) 
such that | . 

A=TJT—". 

If all the elementary divisors of A are of the first degree (and in that 

case only), the Jordan form is a diagonal matrix and we have: 


AS Th Aes As, eee y 7 ee Lae 
Thus: A linear operator A has simple structure (see Chapter itl, § 8) 


if and. only tf all the elementary divisors of A are linear. 


Let us number the vectors @1, €2, ..., @p defined by (70) in the reverse 
order: 
B1=e,=C, B= p_1=(A—ApE)e,..., 8,= ey =(A—AgE)P-e. (79) 


902 ~VII. Srructurt oF LINEAR Operator IN n-DIMENSIONAL SPACE 


Then 
(A—A FE) g,= 82, (A—AF)8.=8s, ---, (A —A,E) g,= 9; 
hence 


Ag,= Aol + 82, AB,=AcB2+ 8s» ---» ABp= AB: 


The vectors (79) form a basis in the cyclic invariant subspace F that 
corresponds in (67) to the elementary divisor (1 — 4o)?. 
In this basis, as is easy to see, to the operator A there cerresponds the 


matrix 
A 9 0... 0 | 
tL Ape Oo 620° 
| 0 1 hy + || =A HO) + Fe), 


| 
0 0...1 Ay | 


We shall say of the vectors (79) that they form a lower Jordan chain of 
vectors. If we take a lower Jordan chain of vectors in each subspace I’, I’, 
wee Dh of (67), we can form from these chains a lower Jordan basis in which 
to the operator A there corresponds the quasi-diagonal matrix 


J,= {a,E 4 FO A 4 Fe coke yO i a ree} ; (80) 


We shall say of the matrix J; that it is of /ower Jordan form. In contrast 
to (80), we shall sometimes call (78) an upper Jordan matrix. 

Thus: Every matriz A is similar to an upper and to a lower Jordan 
matrix. 


§ 8. Krylov’s Method of Transforming the Secular Equation 


1. When a matrix A= | dix in is given, then its characteristic (secular) 
equation can be written in the form 
Qi es Aa Qi9 eo a1, 
Aap (— | Beh Maa Ae Mem | 9, (81) 
| yy ay ny — A 


On the left-hand side of this equation is the characteristic polynomial 
A(A) of degree n. For the direct computation of the coefficients of this 
polynomial it is necessary to expand the characteristic determinant 


§ 8. Keytov’s Mertiop or TRANSFORMING SECULAR EquaTION 203 


| A —AE'|; and for large n this involves very cumbersome computational 
work, because A oceurs in the diagonal elements of the determinant.}® 

In 1937, A. N. Krylov [251] proposed a transformation of the character- 
istic determinant as a result of which 4 oecurs only in the elements of one 
column (or row). 

“KryJov’s transformation simplifies the computation of the coefficients 
of the characteristic equation considerably.*® 

In this section we shall give an algebraic method of trausforming the 
characteristic equation which differs somewhat from Krylov’s own method." 

We consider an n-dimensional vector space R with basis e), es, ..., €, and 
the linear operator A in R determined by a given matrix A = | Aix Ki in this 
basis. We take an arbitrary vector x ~o in R and form the sequence of 


vectors 
x, Ax, A®x,.... (82) 


Suppose that the first p vectors x, Ax, ..., A? 1x of this sequence are 
linearly independent and that the (p + 1)-st vector A?x is a linear combina- 
tion of these p vectors: 


A’x =—a,%—O, 4 AB—->> —@,A? x (83) 
or 
g(A)x=o, (84) 
where 
p (A)= AP a AP eee Oy. (85) 


All the further vectors in (82) can also be expressed linearly by the first 
p vectors of the sequence.?® Thus, in (82) there are p linearly independent 


15 We reeall that the coefficient. of A* in 4(A) is equal (apart from the sign) to the 
sum of all the principal minors of order n—-k in A (K=1,2,...,”). Thus, even for 
n = 6, the direct determination of the coefficient of \ in 4(A) would require the computa- 
tion of six determinants of order 5; that of A? would require fifteen determinants of 
order 4; etc. 


16 The algebraic analysis of Krylov’s method of transforming the secular equation 
is contained in a number of papers [268], [269], [211], [168], and [149]. 

17 Krylov arrived at his method of transformation by starting from a system of n 
linear differential equations with constant coefficients. Krylev’s approach in algebraic 
form can be found, for example, in [268] and [168] and in § 21 of the book [25]. 


18 When we apply the operator A to both sides of (83) we.express A?t+'x linearly in 


terms of Ax,..., AP-'x, A?x . But APx, by (83), is expressed lincarly in terms of 
x, Ax,..., AP-lx. Hence we obtain a similar expression for 4?+!x. By applying the 


operator A to the expression thus obtained for A?+1x, we express A?+2x in terms of 
#> Az, wees APt+ty, ete. 


904. VIT. Srructure or LINEAR OPERATOR IN 2-DIMENSIONAL SPACE 


vectors and this maximal number of linearly independent vectors in (82) 
is always realized by the first p vectors. 

The polynomial ¢ (4) is the minimal (annihilating) polynomial of the 
vector x with respect to the operator A (see §1). The method of Krylov 
consists in an effective determination of the minimal polynomial ¢ (A) of x. 

We consider separately two cases: the regular case, where p=7; and 
the singular case, where p < n. 

The polynomial ¢(A) is a divisor of the minimal polynomial (A) of the 
whole space R.1° and y(A) in turn is a divisor of the characteristic poly- 
nomial 4(1). Therefore g (/) is always a divisor of 4(A). 

In the regular case. g (A) and 14(A) are of the same degree and, since 
their highest.coefficients are equal, they coincide. Thus, in the regular case 


A(Ah=y (A) =? (A), 


and therefore in the regular case Krylov’s method is a method of commiting 
the coefficients of the characteristic polynomial 1(A). 

In the singular case, as we shall see later, Krylov’s method does not enable 
us {0 determine A(A), and in this case it only determines the divisor y(A) 
of A(A). 

In explaining Krylov’s transformation, we shall denote the coordinates 
of x in the given basis e,, e,...,e, by a, b,...,1, and the coordinates of the 
vector A*x by ay, by,..., le (kK=1,2,..., 2). 


2. Regular case: p= n. In this case, the vectors x, Ax, ..., A”~'x are lin- 
early independent and the equations (83), (84), and (85) assume the form 


Ax =— a,%—a,_, Ax—-+-—a, A" x - (86) 
or 
A(A)x=o, (87) 
where ; bs 
A (A) =A" + @yAM 1 4-00 +a, A+ aye - (88) 
The condition of linear independence of the vectors x, 4x,..., A* 1x may 
be written analytically as follows (see Chapter ITI, § 1) 
a b i 
ua! | L 0, (89) 


-— e@ je ee 8 ee @ 


ant by4 - 0. Ly | 


We consider the matrix formed from the coordinate vectors x, AX, ---: 
A" x: 


199 (X) is the minimal polynomial of 4. 


§ 8. Kryztov’s MerHop or TRANSFORMING SECULAR EquaTION 205 


a b 
a & % &4| 
I detain eh ea oe (90) 
Gy, 5,4 by 
‘ a, 5, l, 


In the regular case the rank of this matrix is n. The first n rows of the 
matrix are linearly EnGepen dents and the last, (n + 1)-st, row is a linear 
combination of the preceding n. 

We obtain the dependence between the rows of (90) when we replace 
the vector equation (86) by the equivalent system of » scalar equations 


— a,b = X10; pesca Se — b, (91) 


oo e« e* e © @ @ © @ e@ © @7 @& 0@ @ e@© @© @ @ 


— al — a, aly See Gg = n° 


From this system of ” linear equations we may determine the unknowr 
coefficients a, a, ..., a, uniquely,?° and substitute their values in (88). 
This elimination of a,, a, ..., a, from (88) and (91) can be performed 
symmetrically. For this purpose we rewrite (88) and (91) as follows: 


aa, + Ayhy_y tees +a, 1%, + A,X =0 
ba, + bya mate -+5, 1 @ + da  =0 
Oe ee ee ee ake Geet eh en tenia ar Gy ees (% = 1) 
dag t+ ha, tees +h ay + da ; = 0 
Lo, + Agia tees + Atta, + [A*—A(A)]a,=0 
Since this system of‘n + 1 equations in the n + 1 unknown a, a,..., a, has 
a non-zero solution (a)—1), its determinant must vanish: 
@ ay... Any An, 
6 6, ... b_; b, 
seh A th oe eae eae eas =. (92) 


I ohn 
LA... Amt 4n— Alay) 


Hence we determine 4(A) after a preliminary transposition of the determi- 
nant (92) with respect to the main diagonal: 


20 By (89), the determinant of this system is different from zero . 


206 VII. Structure or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


a b l I 
ay b, ae Le A 
MALAY) sug. bre Sie Shoe FS ; (93) 
Aa. ba 4 An 
a, 6, L mM 


where the constant factor M is determined by (89) and differs from zero. 

The identity (93) represents Krylov’s transformation. In Krylov’s 
determinant on the right-hand side of the identity, A occurs only in the 
elements of the last column; the remaining elements of the determinant do 
not depend on 4. 


Note. In the regular case, the whole space R is cyclic (with respect to A). 
If we choose the vectors x, Ax, . £., A* x as a basis, then in this basis the 
operator A corresponds to a matrix A having the natural normal form 


00...0 —a, 
1 0 e . O ~—@ ys 

A= ss ° e ei (94) 
e °. 0 — a, 


0 eee 1 —a, 


The transition from the original basis e,, e,,...,e, to the basis x, Ax, 
_,., A™1x is accomplished by means of the non-singular transforming matrix 


= Spit (95) 
bh... hy 
and then 
A=TAT-', (96) 
3. Singular case: p<. In this case, the vectors x, Ax, ..., A™'x are 
linearly dependent, so that 
db , 


- e@ 
oe © © «© © @ eb 


§ 8. Kryzov’s Meruop or TRANSFORMING SECULAR Equation 207 


Now (93) had been deduced under the assumption M 540. But both 
sides of this equation are rational integral functions of 4 and of the para- 
meters a, b, ..., 1.21. Therefore it follows by a ‘continuity’ argument that 
(93) also holds for M=0. But then, when Krylov’s determinant is ex- 
panded, all the coefficients turn out to be zero. Thus in the singular case 
(p <n) the formula (93) goes over into the trivial identity 0 = 0. 

Let us consider the matrix formed from the coordinates of the vectors 


x, Axz,..., Ax 


a b meee | 
a, 6b ...h 
bk pee Sade Geren Be ONS (97) 
Ap boy p—1 
a, 4b, ra 


This matrix is of rank p and the first p rows are linearly independent, but 
the last, (p -+ 1)-st, rowis a linear combination of the first » rows with the 
coefficients —a,,—@,_,..., —%, (see (83)). From the n coordinates a, d, 
..., b we can choose p coordinates c, f, ..., h such that the determinant 
formed from the coordinates of the vectors #, Ax, ..., A?'x is different 
from zero: ° : 


M*= . (98) 


ee ef e 8 © @ @# @ e@© @ 


Furthermore, it follows from (83) that: 


a af =. Oy _ohy gare Ofo1 re I; (99) 


From this system of equations the coefficients a, @,,...,%, of the poly- 
nomial (A) (the minimal polynomial of x) are uniquely determined. In 
exact analogy with the regular case (however, with the value n replaced by 
p and the letters a, b,...,l by c, f,..., 2), we may eliminate a, a, ...; &, 
from (85) and (99) and obtain the following formula for ¢ (A) : 


Aga allataQ d+ +08 =a tart. + 0lQt, ee. = 


1,2,...,”), Where af (j,&=1,2,...,n) are the elements of 4' (t= 1, 2,...,7). 


908 VII. SrructTurRe or LINEA OPERATOR IN n-DIMENSIONAL SPACE 


ey. fp i -hy oA 
M*p(Ay=| «2-2 ee ic, 2. Se sedes ‘ (100) 
Co-1 f v—1 °° hy_1 Ae 
Cy fp -h, A? 
4, Let us now clarify the problem: for what matrices A = | ax \|7 and for 
what choice of the original vector x or, what is the same, of the initial para- 
meters a, b,..., l the regular case holds. 


We have seen that in the regular case 
A (A) =» (A) = 9 (A). 


The fact that the characteristic polynomial 4(A) coincides with the 
minimal polynomial y(A) means that in the matrix A= || a, ||? there are 
no two elementary divisors with one and the same characteristic value, i.e., 
all the elementary divisors are co-prime in pairs dn the case where A is a 
matrix of simple structure, this requirement is equivalent to the condition 
that the characteristic equation of A have no multiple roots. 

The fact that the polynomials y(4) and (A) coincide means that for x 
‘we have chosen a vector that generates (by means of A) the-whole space R.. 
Such a vector always exists, by Theorem 2 of § 2. 

But if the condition 4 (4) = y(A) is not satisfied, then however we choose 
the vector x ~ 0, we do not obtain A(A), since the polynomial y(A) obtained 
by Krylov’s method is a divisor of y(A) which in this case does not coincide 
with A(A) but is only a factor of it. By varying the vector x we may obtain 
for (A) every divisor of p(A).” 

The results we have reached can be stated in the form of the following 
theorem : 


THEOREM 14: Krylov’s transformation gives an expression for the char- 
acteristic polynomial A(A) of the. matriz A= || aq ||f im the form of the 
determinant (93) if and only tf two conditions are-satisfted : 

‘1. The elementary divisors of A are co-prime in pars. 

2. The initial parameters a, b, ..., l are the coordinates of a vector x 
that generates the whole n-dimensional space (by means of the operator A 
corresponding to the matrix A).?* 


22 See, for example, [168], p. 48. 


23 In analytical form, this condition means that the columns x, Ax,... , A”-le. are 
linearly independent, where «= (a,b,...,?). 


§ 8. Krytov’s METHOD OF TRANSFORMING SECULAR EquaTION 209 


In general, the Krylov transformation leads ‘o some divisor (A) of the 
characteristic polynomial A(A). This divisor p(A) ts the minimal polynomial 
of the vector x with the coordinates a, b,..., 1 (where a, b,...,l are the 
imtial parameters in the Krylov transformation). 


2. “Let us show how to find the coordinates of a characteristic vector y for 
an arbitrary characteristic value 1) which is a root of the polynomial (A) 
obtained. by Krylov’s method.” ; 

We shall seek a vector y ~ 0 in the form 


y —b,%4+ €,Axt--- +68, AP in. (101) 
Substituting this expression for y in the vector equation 


Ay =hy 
and using (83), we obtain 


£,Ax+ &,A%x4.--+4+ 6, AP ix + &, (—a,% — A, _yAx —-+--—a, A? 1x) 
= Ay (E,% + §,Ax 40-4 EAP x). (102) 


Hence, among other things, it follows that ¢, 540, because the equation 
& = 0 would yield by (102) a linear dependence among the vectors x, Ax, 
...,A?1x. In what follows we set 6,=1. Then we obtain from (102) : 


E,=1, Fy = Ab, + %» Ee = Apfp_1 + Hes +++» = Ags, + Spa» (103) 
0=A)§,+4,- 


The first of these equations determine for us in succession the values &,, €,_1, - 
..,&, (the coordinates of y in the ‘new’ basis x, Ax,..., A? 1x) ; the last 
equation is a consequence of the preceding ones and of the relation 
W+ aap i+... +a, =0. 
The coordinates a’, b’,..., l’ of the vector y in the original basis may be 
found from the following formulas, which follow from (101) : 


a’ = ba + boty + 0+ + F051 
B= 8b + Faby tees + 8 pbp 4 (104) 
eas U + El, a eee El1 ‘ 

Example 1. 

We recommend to the rcader the following scheme of computations. 


eis acess S cee 
24 The following arguments hold both in the regular case p=n and the singular 


case p< 7. 


210 VII. SrrRucTURE oF LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


Under the given matrix A we write the row of the coorainates of x: a, b, 
L 
? 


These numbers are given arbitrarily (with only one condition: at 
least one is different from zero). 


Under the row a, b, ..., | we write the 
row aj, b1,..., d:, 1.e., the coordinates of the vector Ax. The numbers ay, bi, 
..., 4; are obtained by multiplying the row a, b, ..., l successively into the 


rows of the given matrix A. For example, a,—aya+aob+.. 
by = G23 + Gob +... + Gogl, ete. Under the row ay, bi, ..., 1, we write the 


Yow d», bo, ..., 42, ete. Each of the rows, beginning with the second, is deter- 
mined by multiplying the preceding row successively into the rows of the 


-+ ral, 


given matrix. 


Above the given matrix we write the sum row as a check. 


§ 2 10 =—2 
3 —-l —4 2 
A= | 2 6 2-224 
Son ce 8 2 
a ne 
x e,+e, : 1 1 0 0:—1: 1: 
Ax > 2 5 1 $:—1L: —1Li 
A*x 3 5 2 2. lik: 
A®x 0 9 — 1 B: o&@: CL 
A‘x 5 9 4 4: “+ 
o 8 0 4: 
7 o 2 oO 2: 
—4 O-—4 0: 
. { 1 0 1 0; 
The given case is regular, because 
1 1 0 0 
_|2 5 1 84 _ 
ee ge gel ee 
09 -—1 5 
Krylov’s determinant has the form 
1 jl 0 0 1 
2 8 13 A 
—164(4)=|3 6 2 @ atl. 
09 —1 5 # 
5 9 4 4 2d 


Expanding this determinant and cancelling —16 we find: 


A (A) = 44 — 279 4 L(A — 120+ 1). 


§ 8. KRYLOV ’s METHOD oF TRANSFORMING SECULAR EQUATION 211 


We denote by 
y = 6,2 + £,Ax + §,A*x + §,A%x 


a characteristic vector of A corresponding to the characteristic value 4) = 1. 
We find the numbers &,, &2, &3, &, by the formulas (103) : 


£.=1,& =1-4,+0=1, & =1-4,—2=—1,4=—1-4,40=—1. 


The control equation — 1+4, + 1=0 is, of course, satisfied. 

We place the numbers &,, &, &3, &, in a vertical column parallel to the 
columns of x, Ax, A?x, A°x. Multiplying the column 4,, &,, &, &, into the 
columns dh, G2, @3, @,, we obtain the first coordinate a’ of the vector y in the 
original basis e;, €2, @:, @,; similarly we obtain b‘,c’, d’. As coordinates of y 
we find (after cancelling by 4): 0, 2, 0, 1. Similarly, we determine the 
coordinates 1, 0, 1, 0 of a characteristic vector 2 for the characteristic value 
Ag=—l. 

Furthermore, by (94) and (95), 


A= TAT-! 
where 
(000 —1 123 oO 
~ |f1 00 0 _llt658 9 
A~jllo 10. 2 |!’ Tle 12 ~1 
001 0 032 +5 


Example 2. We consider the same matrix A, but as initial parameters 
we take the numbers a= 1, b=0, c=0, d=0. 


8 3 —10 —3 
|/3 —1l — 4 2 
- 2 3. = .2 4 | 
z2—l — 83 2 
1) 2 —1 —3 | 
xs=e,: 1 0 0 0 
Ax -3 2 2 1 
A*x = 1 4 1) 2 
Ax : 3 6 2 3 
But in this oP 
}10oOoO0oO 
M=|3 2 2 1 
140 2|7? 
3 6 2 3| 


912 VII. Structure or LINEAR OPERATOR IN n-DIMENSIONAL SPACE 


Taking the first three coordinates of the vectors x, Ax, A?x, A*x, we write 
the Krylov determinant in the form 


100 1 
322 4 
140 A?|' 
3 6 2 A 


Expanding this determinant and cancelling — 8, we obtain: 
p (A) == AB — AP —A 4 1 (A — 1)? (4 + 1). 

Hence we find three characteristic values: re =1,j4.=1,4,=—1. The 
fourth characteristic value can be obtained from the condition that the sum 
of all the characteristic values must be equal to the trace of the matrix. But 
tr A=0. Hence A,=—1. 

. These.examples show that in applving Krylov’s method, when we write 
down successively the rows of the matrix 


a ob. l 
a, by L 
ag by ..- ly (105) 


oe e @ e,e « «8 


es @ e ® @ @# @ 


it is necessary to watch the rank of the matrix obtained so that we stop after 
the first row (the (p + 1)-st from above) that is a linear combination of the 
preceding ones. The determination of the rank is connected with the com- 
putation of certain determinants. Moreover, after obtaining Krylov’s de- 
terminant in the form (93) or (100), in order to expand it with respect to 
the elements of the last column we have to compute a certain number of 
determinants o: rder p — ] (in the regular case, of order n — 1). 

Instead of expanding Krylov’s determinant we can determine the coeffi- 
cienitS @1, Ge, ... directly from the system of equations (91) (or (99)) by 
applying atiy efficient method of solution to the svstem—for example, the 
elimination method. This method can be applied tmmediately to the matrix 


a be... i d* (106) 


by using it in parallel with the computation of the corresponding rows by 
Krylov’s method. We shall then discover at once a row of the matrix (105) 


§ 8. KrYLov’s METHOD oF TRANSFORMING SECULAR EqQuaTION 213 


that depends on the preceding ones, without computing any determinant. 

Let us explain this ir :ome detail. In the first row of,(106) we take an 
arbitrary element c ~ 0 and we use it to make the element ¢, under it into 
zero, by subtracting from the second row the first row multiplied by c,/c. 
Next we take an element /,* =< 0 in the second row and by means of c and 
f1* we make the elements cy and fe into zero, etc.2> As a result of such a 
transfofmation, the element in the last column of (106) is replaced by a 
polynomial of degree k, g,(A) =AF + --- (K=0,1, 2,...) 

Since under our transformation the rank of the matrix formed from the 
first & rows for any k and the first m columns of (10), does not change, the 
(p + 1)-st row of the matrix must, after the transformation, have the form 


0,0,...,0,9,(A). 


Our transformation does not change the value of the Krylov determinant 


c f JA 1 | 
Cy i hy A 
= Peta ce see seed? . | = M*e(A) 
Cpt fy-1 : hy_y AP 
Cy fp h, A 
Therefore " 
M*p (A)= cf, +++ 9,(A), (107) 


i.e.,2° gp(A) is the required polynomial (4): gp(A) = GA). 
We recommend the following simplification. After obtaining the k-th 
transformed row of (106) 


he ay Opa sss Gar, Opa (A), (108) 


one should obtain the following (k + 1)-st row by multiplying af_4, by_4,..., 
i, (and not the original a,_1,b,_,, ..., & 1 ) into the rows of the given 
matrix.2” Then we find the (k + 1)-st row in the form 


ay, b,, ees Ly, AQy~1 (A), 
and after subtracting the preceding rows, we obtain: 


25 The elements c, f,*,... must not belong to the last column containing the powers. of A. 
26 We recall that the highest coefficients of @(A) and gp(A) are 1. 


27 The simplification consists in the fact that in the row of (108) to be transformed 
k — 1 elements are equal to zero. Therefore it is simple to multiply such a row into the 
rows of A. 


214 VII. Structure or LINFAR OPERATOR IN n-DIMENSIONAL SPACE 


a, , oy, ees gg (A)- 

The slight modification of Krylov’s method that we have recommended 
(its combination with the elimination method) enables us to find at once 
the polynomial ¢(1) that we are interested in (in the regular case, 4(A) ) 
without computing any determinants or solving any auxiliary system of 
equations.”® « 


Example. 
4 4 1 #5 0 
1 1 —!1 1 oO] 
1 2-1 0 1 
A= || —1 2 3 —l1 0 
1-2 1 2 =| 
2 1-1 8 = 0O| 
0 0 0 oO 1:1 
0 1 #O.—1 O:a 
0 2 3 —4 —2:4 [2—4] 
o—2 3 O 0: At*—44+2 
—5—7 5& T—K: B—4494 22 [54 72] 
—6 0 5&5 O- 0: As—44*4+ 9245 


—10--10 20 O—15 At*—4d8 49494 54 [15 —5 (A? — 42 + 2) 

— 2 (A? — 442 + 94 + 5)} 

0 O0O— 5 OO 0: A*— 6A + 1219 + TA—5 

5 5+15 —5 8: AS—6A4+ 12434 719—5A [—5—51+ (2? 
— 443-4 94+ 5) —2 (24 — 643 + 1222 4+ 74 —5)] 

0 0 0 OO  O: AS—B8At + 25/3 — 214? —15A + 10 
eee ae eer 

A (a) 


28 Apart from the method of Krylov, we have acquainted the reader in Chapter IV 
with the method of D. K. Faddcev for the computation of the coefficients of the charac- 
teristic polynomial. Faddeev’s method involves morc computations than Krylov’s but 
it is more general, being without singular cases. We also refer the reader to the very 
effective method of A. M. Danilevskii [131]; see also the expository paper [376] and the 
book [15], § 24. See also [5] and (194]. 


CHAPTER VIII 


MATRIX EQUATIONS 


In this chapter we consider certain types of matrix equations that occur in 
various problems in the theory of matrices and its applications. 


§ 1. The Equation AX = XB 


1. Suppose that the equation 
AX =XB (1) 


is given, where A and B are square matrices (in general of different orders) 


A= ay |,  B=[[Onfh 
and where X is an unknown rectangular matrix of dimension m X n: 
X = || z,|| ((=1,2,...,m; k=1,2,...,n). 


We write down the elementary divisors of A and B (in the field of 
complex numbers) : 

(A): (A—A,)"; (A—A,)", ary (A—A,)** (Py + Po tes+ t+ Py=™m), 

(B):(A—p,)", (A— Uy), see y (A— b,)” (Qqatdgte** +=). 


In accordance with these elementary divisors we reduce A and B to 
Jordan normal form 


A=UAU"', B=VBV", (2) 


where U and V are square non-singular matrices of orders m and n, respec- 
tively, and A and B are the Jordan matrices: 


A=(ae 4H, aye 4 HO, wld HPO), 
B={ EF +. He | upk a HH), dae pu, Ee) He H (4) ie 


215° 


(3) 


216 VIII. Marrix Equations 
Replacing A and B in (1) by their expressions given in (2), we obtain: 
UAU'*X=XVBV"'. 


We multiply both sides of this equation on the left by U—/ and on the right 
by V: 


AU?XV =U *XVB. (4) 
When we introduce in place of X a new unknown matrix X (of the same 
dimension m X n aa gt 


we can write equation (4) as follows: 
AX = XB. (6) 
We have thus replaced the matrix equation (1) by the equation (6), of 
the same form, in which the given matrices have Jordan normal form. 


We partition X into blocks corresponding to the quasi-diagonai form of 
the matrices A and B: 


X= (Xz) (a=1, 2, «5%. B=L, 2, .,%) 


(here Xag is a rectangular matrix of dimension pa X gg (a =1,2,..., 4; 
B=1,2,...,v)). 

Using the rule for multiplying a partitioned-matrix by a quasi-diagonal 
one (see p..42), we carry out the multiplication of the matrices on the left- 
hand and right-hand sides of (6). Then this equation breaks up into uv 
matrix equations 


[An a H'?2)) X= X,4[ lb ge a HH‘) 
(Q2215 256.05 US PSHE, 22585. %)s 
which we rewrite as follows: 
(My—A,) Xp =H Xp — Xap, (a=1,2,...,4; BH=H1,2,...,%)3 (7 
we have used here the abbreviations 
H,= HH G,=H®? (#=1,2,...,4; B=1,2,...,0). (8 


Let us take one of the equations (7). Two cases can oceur: 


1. 4a ~~ ps. We iterate equation (7) r—1 times:? 


1 We multiply both sides of (7) by wg— Ag and in each term of the right-hand side 
we replace (ug— 4a) Xap by HaXap— XapGg. This process is repeated r—1 times 


§1. Toe Equation AX = XB 217 


(ty — 4, Xp= J (—)' (1) HEX, G5. (9) 
o+t=x<?f 
Note that, by (8), 
: Hy = Gje=0 (10) 


If in (9) we take r= pa + gg — 1, then in each term of the sum on the 
right-hand side of (9) at least one of the relations 


CZ2D,, TE], 


is satisfied, so that by (10) either H{=0O or G; =O. Moreover, since in this 
ease Ax 4 4g, we find from (9): 


X,,=0. (11) 
2. Aa = pe. In this case equation (7) assumes the form 
HX 55 =XapGy- (12) 


In the matrices H, and Gg the elements of the first superdiagonal are 
equal to 1, and all the remaining elements are zero. Taking this specific 
structure of H, and Gg, into account and setting 


Xap | Fell ((=1, 2, voey Das k=1,2,..., Ig) » 


we replace the matrix equation (12) by the following equivalent system 
of scalar equations :” 


oetae $41 (So Spy41e = 9; t=1,2,...,p,; R= 1, 2, +++) Gg)» (13) 


The equations (13) have this meaning: 


1) In the matrix Xag the elements of every line parallel to the main 
diagonal -are equal ; 


2) f= Eg, See Et = bpge = ** = Fog, qg—1 = 0- 


Let pa=qzs. Then Xog iS a Square matrix. From 1) and 2) it follows 
that in Xqg all the elements below the main diagonal are zeto, all the elements 
in the main diagonal are equal to a certain number cag, all the elements of 
the first superdiagonal are equal to a number cap, etc.; i.e., 


2 From the structure of the matrices H, and Gg it follows that the product H,XQ¢ is 
obtained from x ap by shifting all the rows one place upwards and filling the last row 
with zeros; similarly, Xap@p is obtained from Xap by shifting all the columns one place 
to the right and filling the first column with zeros (see Chapter I, p. 14). To simplify 
the notation we do not write the additional} indices a, f in Fie. 


218 VIII. Matrix Equations 


Cap Cap =. ei: <6 o(fa-}) | 


0 Cap ° 
x om ioe o 6 . =T,, ; (14) 
Cap 
0 Cap | 
(Pa-1) 


here Cap, Cap, --+-) Cap are arbitrary parameters (the equations (12) do 
not impose any restrictions on the values of these parameters). 
It is easy to see that for pg < qp 


98 -Pa 
we, 
X 4p = ( 0, T,,) (15) 
and for pa > qs 
T 
xX ,=—| 8 : 16 
= c - as 


We shall say of the matrices (14), (15), and (16) that they have regular 
upper triangular form. The number of arbitrary parameters in Xgqg is 
equal to the smaller of the numbers pq and gg. ‘The scheme below shows the 
structure of the matrices Xog for zz = wp (the arbitrary parameters are here 
denoted by a, b, c, and d): 


abe 
ee 00abe 0 «a b 

a ao ¢ = = 
Xe=lo 0a Bl 2 00 04 bj, X,,=|0 0 a 
Ae ge seed 0000 a 00 0 
0 0 0 
(P.= Y= 4) (P,=3, ¢,=5) (p,= 5, 9,=3) 


In order to subsume case 1 also in the count of arbitrary parameters in X, 
we denote by dag(A) the greatest common divisor of the elementary divisors 
(A — da)”* and (A — us) and by dap the degree of the polynomial dag() 
(a=1,2,...,u; B=1,2,...,v). In case 1, we have 6,,—0; in case 2, 
dag= min (p,, gs). Thus, in both cases the number of arbitrary parameters 
in Xag is equal to dss. The number of arbitrary parameters in X is deter- 
mined by the formula. 

u v 
N=)) D dap. 
Om] fel 

In what follows it will be convenient to denote the general solution of 

(6) by Xz% (so far we have denoted it by X). 


§1. Tue Equation AX = XB 219 


The results obtained in this section can be stated in the form of the follow- 
ing theorem: 
THEOREM 1: The general solution of the matrix equation 


AX=XB 
where 
A=llag |! =UAU7= 0 faE™ +, 2, ABO 4 HY 
B= | biz 14 = VBV” = 7. {u, EB “ H® ee ue” if H™) v2 
is given by the formula 
X=UX;,V" (17) 


Here Xj% 1s the general solution of the equation 
AX=XB 
and has the following structure: 
X =; 1s decomposed into blocks 


1B 
X73 = (Xy5) } vs (2 =1, 2, coe, U, B =1, 2, ares %); 


if daX~ bp, then the null matrix stands in the place Xap, but tf da = fp, then 
am arbitrary regular upper triangular matrix stands in the place Xap. 
X 53, and therefore also X, depends linearly on N arbitrary parameters 


C1, C2, ..+, Cn x 


X= 3) ¢,X,, (18) 
jal 


where N 1s determined by the formula 


N=)) 2 425 (19) 


Ql Bol 


(here dag denotes the degree of the greatest common divisor of (A— 12)°* and 
(4 — j4p)"*). | 

Note that the matrices X;, Xo, ..., Xy that occur in (18) are solutions 
of the original equation (1) (Xj; is obtained from X by giving to the para- 
meter c; the value 1 and to the remaining parameters the value 0; j = 1, ,2, 
...,N). These solutions are linearly independent, since otherwise for cer- 
tain values of the parameters ¢), ¢2,..., Cw, not all zero, the matrix XY, and 
therefore X 73, would be the null matrix, which is impossible. Thus (18) 
shows that every solution of the original equation is a linear combination 
of N linearly independent solutions. 


220 VIII. Marrix Equations 


If the matrices A and B do not have common characteristic values (if the 
characteristic polynomials |AE — A| and |~E — B| are co-prime), then 
N= a Ong 0, and so X = O,1.¢.,1n this case the equation (1) has only 

g=1 b= 
the sual solution X= 0. 

Note. Suppose that the elements of A and B belong to some number 
field Fr. Then we cannot say that the elements of U, V, and X zz that occur 
in (17) also belong tor. The elements of these matrices may he taken in an 
extension field F, which is obtained from F by adjoining the roots of the 
characteristic equations | AE — A |=Oand |AE—B\=0. We always have 
to deal with such an extension of the ground field when we use the reduction 
of given matrices to Jordan normal form. 

However, the matrix equation (1) is equivalent to a system of mn linear 
homogeneous equations, where the unknown are the elements xr; (7 = 1, 2, 
3, ...,m;k=1,2,...,n) of the required matrix X : 


m n 
Dd UX = > %,0  (¢=1,2, ..., m; k=1,2, ..., n). (20) 
j=l he=) 


What we have shown is that this system has N linearly independent solu- 
tions, where N is determined by (19). But it is well known that fundamental 
linearly independent solutions can be chosen in the ground field F to which 
the coefficients of (20) belong. Thus, in (18) the matrices X1, Xo,..., Xw 
can be so chosen that their elements lie in F. If we then give to the arbitrary 
parameters in (18) all possible values in F, we obtain all the matrices X 
with elements in F that satisfy the equation (1).° 


§ 2. The Special Case 4 = B. Commuting Matrices 


1. Let us consider the special case of the equation (1): 
AX =XA, (21) 


where A = | Aix i is a given matrix and X = | Liz |? an unknown matrix. 
We have come to a problem of Frobenius: to determine all th~ matrices X 
that commute with a given matrix A. 

We reduce A to Jordan normal form: 


A=UAU*=U {A,EB™ + A, ..., 1,8 + H™) 0". (22) 
3The matrices 4 = |lay|[? and B= || bg1||7 determine a linear operator F(X) = 


AX — XB in the space of rectangular matrices X of dimension m Xn. A treatment of 
operators of this type is contained in the paper [179)}. 


§2. Tur Specra, Cass 4=B. -Commutine Matrices 221 


Then when we set in (17) V=U, B= A and denote Xz simply by Xj, we 
obtain all solutions of (21), ie. en matrices that somite with A, in the 
following form: 

x= UX; A, _ & . (23) 


where Xj; denotes an arbitrary matrix permutable with A. As we have 
explained in the preceding section, X; is split into wu? blocks 


Xz= (Aap) 


corresponding to the splitting of the Jordan matrix A into blocks: Xap is 
either the null matrix or an arbitrary regular upper triangular matrix, 
denending on whether de 5 dg or Ag =A. 

As an example, we write down the elements of X~ in the case where A 
has the following elementary divisors: 


(A—A,)4, (A—A,)8, (A—Ag)®, A—Ag. (Ay Ay). 


In this case X; has the following form: 


a be de f g 0 0:0 

0a b &.0 e fi0 0:0 

00450 0 e€ 0 00 

0 0 0 a:0 0 0:0 0:0 

Oh &k aes P a8 on (a, b, ...,z2 are arbitrary 
0 Oh & 0 m p 3 0 90 : 0 parameters). 
000h0 0 m0°0'0 

00 0 0:0 0 Or sit 

0 0 0 0:0 0 0:0 rio 

00000 0 00 wz] 


The number of parameters in Xz is equal to NV, where N= S bap 


here dag denotes the degree of the greatest common divisor of the poly aonuale 
(4 da)? and (4 — dp)". 

Let us bring the invariant polynomials of -i into the discussion : aay, 
to(A), ....4(A)3 44,0) = -. = 4,(4)= 1. We denote the degrees of these 
polynomials by ny 2 m2 2...27, > t41—.--.-=9. Since each invariant 
polynomial is a product of certain co-prime elementary: divisors, the formula 
for N can be written as follows: 


222 VIII. Marrrx Equations 


N= J xq, (24) 


gi=l 
where x, is the degree of the greatest common divisor of 1,(A) and 4(A) 
(9, j=1,2,...,¢). But the greatest common divisor of 1,(A) and 4;(A) 1s 
one of these polynomials and therefore ,, = min (ng, n;). Hence we obtain: 


N= +3 +-°-+(2t—I1)n,. 


N is the number of linearly independent matrices that commute with A 
(we may assume that the elements of these matrices belong to the ground 
field F containing the elements of A ; see the remark at the end of the preced- 
ing section). We have arrived at the following theorem : 


THEOREM 2: The number of linearly independent matrices that commute 


with the matriz A= || ax ||t ts given by the formula 
where ny, N2,.,.., % are the degrees of the non-constant invariant polynomials 
4 (A), t2(A), sey (A) of A. 
Note that 
N=MN+ Net... +m. (26) 


From (25) and (26) it follows that 
N=n, (27) 


where the equality sign holds if and only if ¢=1, 1.e., if all the elementary 
divisors of A are co-prime in pairs. 


2. Let g(A) be an arbitrary polynomial in A. Then g(A) is permutable with 
A. There arises the converse question: when can every matrix that is per- 
mutable with A be expressed as a polynomial in A? Every matrix that com- 
mutes with A would then be a linear combination of the linearly independent 
matrices 


E, A, A®, ..., Am, 


Hence N= n; = n; on comparing this with (27), we obtain: N=n, =n. 

COROLLARY 1 TO THEOREM 2: All the matrices that are permutable with 
A can be crpressed as polynomials in A tf and only if ny =n, 1.¢., of all the 
elementary divisors of A are co-prime in pars. 


§2. Tre Specian Cass A=B. ComMmMuTING MATRICES 223 


3. The polynomials in a matrix that commutes with A also commute with A. 
We raise the question: when can all the matrices that commute with A be 
expressed in the form of polynomials in one ai:d the same matrix C? Let us 
consider the case in which they can be so expressed. Then since by the 
Hamilton Cayley Theorem the matrix C satisfies its characteristic equation, 
every matrix that commutes with C must be expressible linearly by the 
matrices 
BeCOCe g503°Cr* 


Therefore in this case Nn. Comparing this with (27), we find that 
N=n. Hence from (25) and (26) we also have n, =n. 


COROLLARY 2 TO THEOREM 2: All the matroces that are permutable with 
A can be expressed in the form of polynomials in one and the same matriz 
C if and only if ny =N, 1.e. if and only if all the elementary divisors of 
JE — Aareco-prime. In this case all the matrices that are permutable with 
A can be represented in the form of polynomials in A. 


4, We mention a very important property of permutable matrices. 


TuEorEM 3: If two matrices A= || ax ||1 and B= | bee [2 are per- 
mutable and if one of them, say A, hus quasi-diagonal form 
ie 
A ={A,, Aj}, (28) 


where the matrices A; and Az do not have characteristic values 1n common, 
then the other matriz also has the same quasi-diagonal form 


5, 8 


B={B,, B,}. (29) 


Proof. We split B into blocks corresponding to the quasi-diagonal form 
(28) : 


Or $0 


r* 
B= 
Y B, 


From the relation AB = BA we obtain four matrix equations: 
1. A,B, =B,A,, 2. AAX=XA,g, 3. A,Y=YA,. 4. AgBe= BoA. (30) 


As we explained in § 1 (p. 220), the second and third of the equations in (30) 
only have the solutions X =O. ¥ =O, since A, and A» have no characteristic 
values in common. This proves our statement. The first and fourth of the 
equations in (30) express the permutability of A; and B, and of Az and By. 


224 VIII. Matrix Equations 


In geometrical language, this theorem runs as follows: 


“Tueorem 3’: If | 
R =f, + I, 


is a decomposition of the whole space R into invariant subspaces 1, and Ip 
with respect to an operator A and if the minimal polynomials of these sub- 
spaces (with respect to A) are co-prime, then I, and In arc invariant with 
respect to any linear operator B that commutes with A. 

Let us also give a geometrical proof of this statement. We denote by 
Wi(A) and we(A) the minimal polynomials of 4, and I. with respect to A. 
From the fact that they are co-prime it-follows that all the vectors of R that 
satisfy the equation y,;(A)a =o belong to 4, and all the vectors that satisfy 
we(A)x =o belong to Io. Let x, ¢4;. Then y,(A)x, =o. The permutabil- 
ity of A and B implies that of y,(A) and B, so that 


y, (A) Bx, = By, (A) x,=0, 


i.e, Bx, ¢ T,. The invariance of I. with respect to B is proved similarly. 
This theorem leads to a number of corollaries: 
CoroLuary 1: If the linear operators A, B,..., L are pairwise permu- 
table, then the whole space R can be split into subspaces invariant unth 
respect to all the operators A, B,....,L 


R=[,+h+...4+f, 


such that the:minimal polynomial of gach of these subspaces unth respect to 
any onc of the operators A, B...., Lisa power of anirreducible polynomial. 

As a special case of this we obtain : 

Corouvary 2: If the lincar operators A, B,..., L are pairwise permu- 
table and all the charactcristw values of these operators belong to the ground 
field, then the whole space R can be split into subspaces hh, Is, ... , Tyo, invart- 
ant with respect to all the operators such that each operator A, B,..., L has 
equal characteristic values in each of them. 

Finally, we mention a further special case of this statement : 

Coroutuary 3: 1f A, B,...,L are pairwise permutable operators of simple 
structure (see Chapter III: § 8), then a basis of the space can be formed 
from common characteristic vectors of these operators. 

We also give the matrix form of the last statement: 


Permutable matrices of simple structure can be brought into diagonal 
form simultaneously by a similarity transformation. 


4 See Theorem 1 of Chapter VII (p. 179). 


§4. Tue Scauar Equation f(X) =O 225 


§3. The Equation AX — XB=C 


1. Suppose that the matrix equation 


AX —XB=C (31) 
is given, where 4 = l ij mt and B= || bx: ||} are given square matrices of 
order m and » and where (= 1) Cak || and X = || xj. | are a given and an un- 


known rectangular matrix, respectively, of dimension m X n. The equation 
(31) 3s eqnivalent to a system of mn scalar equations in the elements of X: 


m n 
Dy byX_p— DY %by= Cy (6 = 1,2, ...,m; k=1,2,..., m). (31’) 
j=l - tel 
The corresponding homogeneous system of equations 
m n 
Dy Ut, — DS) tyb,=0 (6=1,2,...,m;k=1,2,...,0), 
j=l tml 


can be written in matrix form as follows: 
AX —XB=0O0. (32) 


Thus, if (32) only has the trivial solution X =O, then (31) has a unique 
solution. But we have established in §1 that the only solution of (32) is 
the trivial one if and only if A and B do not have common characteristics 
values. Therefore, if the matrices A and B do not have characteristic values 
in common, then (31) has a unique solution; but if the matrices A and B 
have characteristic values in common, then two cases may arise depending 
on the ‘constant’ term C: either the equation (31) ts contradictory, or at has 
an infimte number of solutions given by the formula 


X= X,+ X;, 


where X, 1s a fixed particular solution of (31) and X, the general solution 
of the homogeneous equation (32) (the structure of X, was described in § 1). 


§ 4. The Scalar Equation f(X) = O 


lL. To begin with, let us consider the equation 


g(X) =0, (33) 
where 


g (A)= (A — Ay) (A — Ag) + (A—A,) 


226 VITI. Marrrx Equations 


is a given polynomial in the variable 4 and X is an unknown square matrix 
of order n. Since the minimal poiynomial cf Y, 1.e.. the first invariant 
polynomial 2,(A), must be a divisor of g(A), the clementary divisors of X 
must have the following form: 


94s Fe 6 0 9 HT, 2 kceg BG 
(A—A,,)*A, (A —A,;,)Ph, See (A —A,)Pi Pi, = aj, Pi, = Aj, +55 Pi, < a;, ’ 
ag 2 am i a 
(among the indices 7,, j2,..., 97, there may be some that are equal; n is the 


given order of the unknown matrix X). 
We represent X in the form 


X=T (A, BOW + HOW, A, BPM + HOW) TH, (34) 


where 7 is an arbitrary non-singular matrix of order n. The set of solutions 
of the equation (33) with a given order of the unknown matrix spli.s, bv 
formula (34), into a finite number of classes of similar matrices. 


Example 1. Let the equation 


A"™=0O (35) 
be given. 

If a eertain power of a matrix is the null matrix, then the matrix is 
called nupotent. The least exponent for which the power of the matrix is 
the null matrix is called the iadez of nilpotency. 

Obviously, the solutions of (85) are all the nilpotent matrices with an 
index of nilpotency «Sm. The formula that comprises all the solutions of 
a given order n looks as fcllows (7 is an arbitrary non-singular matrix): 


X=T (HH), HO), |. FH») mt (Pi Py» -++s Dy Sm, ) (36) 
Pit Petes +p,=n2 


Example 2. Let the equation 


X*=X (87) 
be given. 

A matrix satisfying this equation is called idempotent. The elementary 
divisors of an idempotent matrix can only be 4 or 2—1. Therefore an 
idempotent matrix can be described as a matrix of simple structure (i.e., 
reducible to diagonal form) with characteristic values 0 or 1. The formula 
comprising all the idempotent matrices of a given order 7 has the form 


X=T{1,1,...,1,0,...,0}77, (38) 
EE, pe 
nr 


where 7 is an arbitrary non-singular matrix of order n. 


$5. Matrix PoLyNomiaL EQuaTIons 227 


2. Let us now consider the more general equation 
f(X) =0, (39) 


where f(A) is a regular function of 4 in some domain G of the complex plane. 
We shall require of the unknown solution X = | tu, ||? that its character- 
istic values belong to G and that their multiplicities be as follows : 


Zeros: Ai Ao sees 
Multiplicities: @,, ag, .... 


As in the preceding case, every elementary divisor of X must have the 
form 
(A —A,)" (p, Sa), 
and therefore 
X= 7 (A, BOW + HW, ., A BPW 4 HW) TA (40) 
(Jus Jos ++ -5 J, =, 2, -- 65 Pa SG Ph Sys «+> Py, SM,3 
Pi, + Di, + 2 °* + 14,= 2) 


(T is an arbitrary non-singular matrix). 


§ 5. Matrix Polynomial Equations 
I. Let us: consider the equations 
AgX" + A,X") 4 ++++ A, =O, (41) 
Y"4,+ ¥"'A,+++-4+4,=0, (42) 


where Ao, Ai, ..., Am are given square matrices of order n and X, Y are 
unknown square matrices of the same order. The equation (33) investigated 
in the preceding section is a very special—one could almost say, trivial— 
case of (41) and (42) and is obtained by setting A,—a,E, where a; is a 
number andz=—1, 2,...,m. 

The following theorem establishes a connection between (41), (42), and 
(33). 


928 VIII. Marrrx Equations 
TueorEM 4: Hvery solution of the matrix equation 
A,X” + Axe + eve =f A,, =O 
satisfies the scalar equation 
9(X)=0O, (43) 


where = 
g (A) =| Aga” + AA™ 4 ee + AQ (44) 


The same scalar equation 1s satisfied by every solution Y of the matrix 
equation 
Y"4o5+Y" A, +-0-+4, = 0. 
Proof. We denote by F(A) the matrix polynomial 
F(A)=A,A" + AA" 4 + + AQ. 
Then the equations (41) and (42) can be written as follows (see p. 81) : 
F(X)=.0, F(y)=0. 


By the generalized Bézout Theorem (Chapter IV, § 3), if X and Y are 
solutions of these equations, the matrix polynomial F(A) is divisible on the 
right by AH — X and on the left by AE — Y: 


F(a)= Q (A) (AE —X) = (AE— Y) Q, (A). 
g (A) =|F (A) | =|Q (A)! A(A) =| Q, (A) | 4, (A). (45) 


where 4(4) =| AE — X | and 4,(A) =| AE — Y | are the characteristic poly- 
nomials of XY and Y. By the Hamilton-Cayley Theorem (Chapter IV, § 4), 
A(X)=0, A(Y)=O. 
Therefore (45) implies that 


9(X) =9(Y) =9, 


Hence 


and the theorem is proved. 


Note that the Hamilton-Cayley Theorem is a special case of this theorem. 
For every square matrix A, when substituted for A, satisfies the equation 


1E—A=O. 
Therefore, by the theorem just proved, 


A(A) =O, 
where A(4) =| 4H — A |. 


§ 5. Matrix PotynomiaL EQUATIONS " 229 


2. Theorem 4 can be generalized as follows: 


THEOREM 5:5 If Xo, Xi, ..., Xm are pairwise permutable square mat- 
rices of order n that satisfy the matrix equation 


AgX9 + A,X, +-+-+A,X,=0 : (46) 
(Ao, Ai, ..., Am are given square matrices of order n), then the same mat- 
rices Xo, X1, ..-, Xp satisfy the scalar equation 
g (Xo, Xy, ees Xm) = O, (47) 
where 
9 (Eo, Giantess En) = | Ag€p + 416, + °° + Ane, |- (48) 


Proof. Weset® 

F (&5, Es, voey Cs = ll fix (So é., ce by ) in =Apbo +A,&, ++ wae + Aa Eas 
Eo, €1,---, & are scalar variables. | | . | 

We denote by F (£5; 1; « «+> Sm) =||fix (o> Sn - m)||2 the adjoint 
matrix of F ( fin is the algebraic complement of f,; in the determinant 
|F (EF ,€15 «+» +> Sm) =| fal? (4, & = 1, 2,...,2)). Then every element fix, (k= 
1,2,...,”) of F is a homogeneous polynomial in £,, é,,..-,&, of degree 
m —1, so that F can be represented in the form 

Pa SPs SER ee Om, 
fothtes tha=n—1 

where Fas, ihe are certain constant matrices of order n. 

From the definition of F there follows the identity 


FRP= (Eo. &y) «++ Em) E- 


We write this in the following form: 


| Ficte--tm (Anta + Arba + *** + Amb) EGET +++ 
fotht:>*tim-—17 i g (Ep, E oe = E. (49) 
The transition from the left-hand side of (49) to the right-hand side is 
accomplished by removing the parentheses and collecting similar terms. 
In this process we have to permute the variables &o, 1, ..., &m among each 
other, but we do not have to permute the variabies &>, 1, ..., 6m with the 
matrix coefficients A, and Pytass ie Therefore the equation (49) is not 
violated when we substitute for the variables &, 61, ..., &m the pairwise 
permutable matrices Xo, X,,...,X,: 


5 See [318]. 
6 The fie (So. 64,---,&m) are linear forms in £9, &,.--,&m (i,k =1,2,...,2). 


230 VIII. Marri Equations 


Fig ty -+- ty (Ag Xq + A,X + +++ + Ag Xm) KEXP +++ Xi 
fottit-+++ Ip, =n--1 
=9 (Xp, X3, i | Xm) > (50) 

But, by assumption, 

AyXo + A,X, + °° +A,X,,= O. 
Therefore we find from (50) : 

g (Xo, Xi; res X,,) =O, 
and this is what we had to prove. 
Note 1. Theorem 5 remains valid if (46) is replaced by 
Xp4g t+ X14, +°°°+ X,,4, =9. (51) 


For we can apply Theorem 5 to the equation 
AyXo t+ 4,4, +++ +A) X, =O 


and then go over term by term to the transposed matrices. 
Note 2 Theorem 4 is obtained as a special case of Theorem 5, when we 
take for Xo, X,,..., Xp 


RO x a OE 


3. We have snown that every solution of (41) satisfies the scalar equation 
(of degree = mn) 
g(a) =U. 


But the set of matrix solutions of this equation with a given order n splits 
into a finite number of classes of similar matrices (see §4). Therefore all 
the solutions of (41) have to be looked for among the matrices of the form 


T.D,T; (52) 


(here D, are well-defined matrices ; if we wish, we may assume that the D; 
have Jordan normal form. 1; are arbitrary non-singular matrices of order 
n;4=1,2,...,n). In (41) we substitute for X the matrix (52) and choose 
T; such that the equation (41) is satisfied. For each 7; we obtain a linear 
equation 


Ag? ;Dy + A,T,Dp + ++++A,Ti=O (t= 1,2,..., 0). (53) 


A natural method of finding solutions 7; of (53) is to replace the matrix 
equation by a system of linear homogeneous sealar equations in the elements 


§ 6. EXTRACTION oF m-TH Roots or Non-Sinquntar Matrix 231 


of the required matrix 7; Each non-singular solution 7; of (53), when 
substituted in (52), yields a solution of the given equation (41). Similar 
arguments may be applied to the equation (42). 

_ In the following two sections we shall consider special cases of (41) 
connected with the extraction of m-th roots of a matrix. 


§ 6. The Extraction of m-th Roots of a Non-Singular Matrix 
l. In this section and the foilowing, we deal with the equation 
A”™= A, (54) 


where A is a given matrix and X an unknown matrix (both of order n) and 
m is a given positive integer. 

In this section we consider the case | A | 0 (A is non-singular). All 
the characteristic values of A are different from zero in this case (since 
| A | is the product of these characteristic values). 

We denote by 


(A nei Ay)” 9 (A a Ae)? 9 e009 (A — A,,)P# (55) 
the elementary divisors of A and reduce A to Jordan normal form :’ 
A=UAU=U {A,E,4+ Hy, ...,4,E,+H,} U7. (56) 


Since the characteristic values of the unknown matrix X, when raised to 
the m-th power. give the characteristic values of A, all the characteristic 
values of X are also different from zero. Therefore the derivative of 
f(A) =A™ does not vanish on these characteristic values. But then (see 
Chapter VI, p. 158) the elementary divisors of X do not ‘decompose’ when 
X is raised to the m-th power. From what we have said, it follows that the 
elementary divisors of X are: 


(A — &,)”, (A — &,)?*, eoey (A— &,)™. (57) 
where &; = 4,,1.e., €; is one of the m-th roots of A, (E, = "V4, 3>g=1,2,...,u). 
We now determine Y 4,E,+ H,; in the following way. In the j-plane we 


take a circle, with center A;, not containing the origin. In this circle we have 
m distinct branches of the function Va. These branches can be distinguished 
from one another by the value they assume at the center A; of the circle. We 
denote by Va that branch whose value at 4; coincides with the characteristic 
value &; of the unknown matrix X, and starting from this branch we define 
the matrix function ‘V4, i,+ H; by means of the series 


7 Here Ey = HE) and Hi =H) (fj =1,2,...,u). 


232 VIII. Matrrx EQuations 


1 
11/1 a 
‘VAE, + H, _+ H, = 75,41 Ly “Hee ae Hj ++: (58) 


2Qitm 7 7 9 
which breaks off. — 
Since the derivative of the function ya at A; is not zero, the matrix (58) 


has only one elementary divisor (4—&))", where &, = Va (here j= 1, 2, 
3,...,%). Hence it follows that the quasi-diagonal matrix 


("V4,2, + Hy, VA,E,+ H,,..., \A,E, + H,} 


has the elementary divisors (57), i.e., the same elementary divisors as the 
unknown matrix X. Therefore there exists a non-singular matrix T 
(| T |540) such that 


X=T7 (VAB, + Hy, VAE,+ Hy, ..., JAB, +H} T. (59) 
In order to determine 7, we note that if on both sides of the identity 
(Yay™=A 
we substitute the matrix 4,;£; + H; (j=1,2..... u) in place of A, we obtain: 
(VA,E, + Hj)” =1,E, +H, (j=1,2,...,%). 
Now from (54) and (59) it follows that 


A=T {A,E,+H,, A,f,+ H,,...,A,8,+ H,}) 7+. (60) 
Comparing (56) and (60) we find: 
T=UXyz, (61) 


where Xj is an arbitrary non-singular matrix permutable with A (the struc- 
ture of Xz is described in detail in § 2). 

When we substitute in (59) for T the expression UXz we obtain a formula 
that comprises all the solutions of the equation (54) : 


X=UX; (VLE, + Hy, Va, + Hy, ..., VAE, +H} X7U . (62) 


The multivalence of the right-hand side of this formula has a diserete as 
well as a continuous character: the discrete (in this case finite) character 
arises from the choice of the distinct branches of the function VA in the 
various blocks of the quasi-diagonal matrix (for 4,= A, the branches of Va 
in the j-th and k-th diagonal blocks may even be distinct) ; the continuous 
character arises from the arbitrary parameters contained in X7. 


§ 6. EXTRACTION OF m-TH Roots or Non-SINGULAR Matrix 233 


Al) solutions of (54) will be called m-th roots of A and will be denoted 
by the many- -valued symbol VA. We point out that YA is, in general, not 
a function of the matrix A (i.e., is not representable in the form of a poly- 


nomial in A). 
Note. If all the elementary divisors of A are co-prime in pairs, i.e., if 
the numbers 4;, Jo, ..., A, are all distinct, then the matrix Xz has quasi- 


diagonal form 
Xz=(X,, Xs; aa sry Ay} 


where X; is permutable with 1,£; + H, and therefore permutable with every 


function of 4, E, + H, and, in particular, with ‘Vi,E,+ 4, (j=1],2,..., 4). 
Therefore in this case (62) assumes the form 


bee ene et ete et 
X=U({VAE,=A,, VA, + Hy, ..., VA, + H,} U4. 
Thus, if the elementary divisors of A are co-prime in pairs, then in the 
m 
formula for X = jA only a discrete multivalence occurs. In this case every 
value of "A can be represented as a polynomial in A. 
2. Example. Suppose it is required to find all square roots of 


110 
A=|/0 1 90 
00 1 


i.e., all solutions of the equation 
X?*= A. 
In this case’ A has already the Jordan normal form. Therefore in (62) 


wecanse A=A,U=E. The matrix Xjz in this case looks as follows: 


Xz 


abe 
=|10 a 0 
Ode 


where a, b, c, d, and e are arbitrary parameters. 
The formula (62), which gives all the required solutions X, now assumes 


the following form: 


€ -1 


abe es 0 abe 
xXx=(||0 a 0 0 e« 0 0a 0 (se? = 7? = 1). (63) 
Ode 00 7 Ode 


234 VITT. Marrrx Equations 


Without changing X we may multiply YZ in (62) by a scalar sc that 
| Xz |=1. Then this leads to the equation a?e = 1; and hence e=a-*. 
Let us compute the elements of X='. For this purpose we write down 
the linear transformation with the matrix coefficients of Xz: 


Y,— 22, + bz, + cx, 
Yo AX2; 
Ya = diy +02, 


We solve this system of equations with respect to z,, Z2, r,. Then we 
obtain the transformation with the inverse matrix A=): 
z= a—ty, — (a—*b — cd) yg — acy, 
2y—a-"Yo, 
%y—— ady, + a*ys. 
Hence we find: 


abe 1 a ¢ed—a—*) —ac 
x7 = 0 a0 — || 0 a-l 0 
oda | 0 -a @ 


The formula (63) yields: 
e (€—n)acd+ 5 atc (7 — €) 


XxX =|| 0 E 0 
|} 0 (s—n)da-? n 
e (e—n)mw+ = (g—s)» || 
= {0 & 0 (yv—a*c, w—a-"d). (64) 
0 (e—n) w v) 


The solution XY depends on two arbitrary parameters u and w and two 
arbitrary signs € and ». 


§ 7. The Extraction of m-th Roots of a Singular Matrix 


1. We pass on to the discussion of the ease where | A|=0 (A isa singular 
matrix). 
As in the first case, we reduce A to the Jordan normal form: 


A= TT {A,B + Ho, - A,B) +. H(?u) : Hm), Hm). ee HH) U-1 : (65) 


here we have denoted by (A — 4,)"1, ...,(4— A,)?# the elementary divisors of 
A that correspond to non-zero characteristic values, and by Au, A%, ..: , A% 
the elementary divisors with characteristic value zero. 


§ 7. EXTRACTION OF m-TH Roots or SinguLarR Matrix 200 


Then 
A=U{A,, A,} U-, (66) 


where 
AL = (AE +H, ..., A,BOO + Hd}, A, = {H@), Hi), 26, WD}. (67) 
Note that A, is-a non-singular matrix (| A,;|+0) and Az a nilpotent 
matrix with index of nilpotency “= max (qi, ga, ..., @) (AS=O). 
The original equation (54) implies that A commutes with the unknown 
matrix X and therefore the similar matrices 


U-AU={A4,,4,} and U XU (68) 


also commute. Oe 

As we have shown in §2 (Theorem 3), from the permutability of the 
matrices (68) and the fact that A; and A» do not have characteristic values in 
common. it follows that the second matrix in (68) has a corresponding quasi- 
diagonal form 


U2 XU ={X,, X,}. (69) 
When we replace the matrices A and X in (54) by the similar matrices 
{A,,A4,} and {X,, X3}, 
we replace (54) by two equations: 

x= A,, (70) 
Xe = Ag. (7%) 
Since | A; | #0, the results of the preceding section are applicable to 

(70). Therefore we find X, by the formula (62): 

mn, ———$—$—————____ m ,———__——— OE 1 

X,=X4,(VA,EO + HO, ..., pA, Bow + Hw} Xz. (72) 


Thus it remains to consider the equation (71), i.c., to find all m-th roots 
of the nilpotent matrix As, which already has the Jordan normal form 


A, = {H%), H(%) coe Hy}; (73) 


f= max (qi, G2,.-.. Gt) 18 the index of nilpotency of Ag. 
From Az = O and (71) we find 


x™ —Q, 


The last equation shows that the required matrix X, is also nilpotent 
with an index of nilpotency v, where m(ué- 1)<»S mpu.We reduce X2 to the 
Jordar. form: 


236 VIII. Marrix Equations 
xX, = T {H(), H®), .. 6 Hs} T-1 (74) 


(U4, Ug) es05 Up SP) 
Now we raise both sides of (74) to the m-th power. We obtain: 
A,=X,=T (HM, (Hop, ..., [HCO}") 72, (75) 


2. Let us now elarify the question of what elementary divisors the matrix 
[a }" has.§ We denote by H the linear operator given by A in a v-dimen- 
sional vector space with the basis e;, ex... . . e,. Then from the form of the 


matrix H™ (in H all the elements of the first superdiagonal are equal to 1 
and all the remaining elements are 0) it follows that 


He,=0, He,=e,, ..., He,=e,.,. (76) 


These equations show that the vectors e;, en. .... e, form a Jordan chain 


~” 


for H,’corresponding to the elementary divisor 4”. 
We write (76) as follows: 


He, =e, , (7 =1,2, ..., v; eg =0). 
Obviously, 
H" 2,;=€; (j=1,2, ..., vy @eg=e_, =:-- €_,,,, =O). (77) 
We express v in the form 


v=km+r (r<m), 


where k and r are non-negative integers. We arrange the basis vectors 
€1, @2,..., €, In the following way: 


£4; eg, ‘ ? Cm 

Cm+1 em+2 » Com 

a e e . . e © e ‘J e e e ee e e @ (78) 
CK) mtv 8 (k—-1) me 20 +> Cem 


Cimt+D ee ey Cimtr: 


This table has » columns: the first r columns contain k + 1 vectors each, 
the remaining ones k vectors. The equation (77) shows that the vectors of 
each column form a Jordan chain with respect to the operator H™. If instead 


5 This question is answered by Theorem 9 of Chapter VI (p. 158). Here we are 
compelled to use another method of investigating the problem, because we have to find 
not only the elementary divisor of the matrix [H(*)}™ , but also a matrix po,m transform- 
ing (H(]" into Jordan form. 


§7. ExtractTION oF m-TH Roots or SINGULAR MatTRIx 237 


of numbering the vectors (78) by rows we number them by cclumns, we 
obtain a new basis in which the matrix of the operator H™ has the following 
Jordan normal form :® 


{H+ 11, H@+), HO, ..., H®); 
ee, a ER 
and therefore 
(Hoy = Pn {H@+), ..., HO+), HO, ..., H®} ea (79) 
—SS eared 


where the matrix P,.,, (describing the transition from the one basis to the 
other) has the following form (see Chapter III, § 4): 


i 0...0 0 
‘0 0...0 1 
P, j= ; | TOs « (80) 
0 0... 0 
0 1...0 


ce ce cate. ak a 

The matrix H has the single elementary divisor 4”. When H“ is raised 

to the m-th power, this elementary divisor ‘falls apart.’ As (79) shows, 
[H™]" has the elementary divisors: 


& . 
A +1 eee 9 Ak+1, A, oeoe 3 A. 
eee EE, pore ane, pee 


Turning now to (75), we set: 
%=km+n7, (0<1,<m, 20; $=1,2,..., 8), (81) 
Then, by (79), equation (75) can be written as follows: 
A,=X™=TP (H+), |, HtD, HOD, Ht), 
oe ma 


H+), ,.., H+), Ae), .,.} PATA, +82) 
eee, ee 


rs 


where P= {P, sm Pym es 09 Pr» .m) 
® In the case k = 0, the blocks H®) | Seay H™ ore absent, and the matrix has the form 
SE, 


(H®),. HO, mr 


938 VIII. Marrix Equations 
Comparing (82) with (73), we see that the blocks 
Hat, Bais H%4+1) Hey, ae AH), Hl'e+1) > ows Hit) rans (83) 


must coincide, apart from the order, with the blocks 
H®™, H® ain HY (84) 


3. Let us call a system of elementary divisors 4", a"... ., as admissible for 
X. if after raising of the matrix to the m-th power these elementary divisors 
split and generate the given system of elementary divisors of lg: &%, At. 
qis,..., Au. The number of admissible systems of elementary divisors is 
always finite, because 


° 


MAX (V1, Ug, «++, U,) SMP, Vy + Vetere + U,= MN, (85) 
(no is the order of Ao). 


In every concrete case the admissible systems of elementary divisors for 
Xo can easily be uetermined by a finite number of trials. 

Let us show that for each admissible system of elementary divisors a”, A", 
_.., 4% form a corresponding sulution of (71) and let us determine all these 
solutions. In this ease there exists a transforming matrix Q such that 


(HEAD, HY, HOY, HOY, HOD, QA. (86) 


The matrix QO describes the permutation of the blocks in the quasi-diagonal 
matrix that brings about the proper renumbering of the basis vectors. 
Therefore Q van be regarded as knowh. Using (86), we obtain from (82): 

A,=TPQ1A,QP1T!. 
Hence 
TPQ) =X 4, : 
or 


T= X1,QP4, (37) 


where Xag is an arbitrary matrix that commutes with A». 
Substituting (87) for T in (74), we have 


X,= X4,QP3(H, HO, ..., HO} PQIX2. (88) 


From (69), (72), and (8%) we obtain a general formula which comprises 
all the solutions : 
X=U( Xa, X4,QP7) (VAL +H, ..., YA Heo + HO, 
Ho, ..., Hey} + {X7, PQIXA}U-+. (89) 


§ 8. Tse LoaaritHm or A Matrix 239 


We draw the reader’s attention to the fact that the m-th root of a singular 
matrix does not always exist. Its existence is bound up with the existence 
of a system of admissible elementary divisors for Xe. 

_ It is easy to see, for example, that the equation 


xX" — AP) 
has no solution for m > 1, p > 1. 


Ezample. Suppose it is required to extract the square root of 


01 0 
0 0 0 
0 0 0 


A= 


i.e., to find all the solutions of the equation 
Ai= A. 


In this case, A= Ao, X = Xo, m= 2, t=2, qi=2, and g2=1. The 
matrix X can only have the one elementary divisor 4°. Therefore s=1, 
Vv, = 3, ky =1, r1: =1 and (see (80)) 

1 0 0 


00.1 
01 0 


P=>P,,= = P-1,Q= #. 


Moreover, as in the example on page 233, in (88) we may set: 


abe a-l cd—a-2h —ac 
Xa=|}0 a O |, XZ =I/0 a-1 0 
0 dad a-2 0 —ad a? 
From this formula we obtain 
0a 8B 
X =X,=X4,P7H® PX =|}0 0 Ol, 
0 p1 0 


where a = ca—!— a?d and 8 =a’ are arbitrary parameters. 


§ 8. The Logarithm of a Matrix 
1. We consider the matrix equation 
eX =A. (90) 


All the solutions of this equation are called (natural) logarithms of A 
and are denoted by In A. 


240 VITI. Marrrx EQuations 


The characteristic values A; of A are connected with the characteristic 
valnes &; of X by the formula 14; =e; therefore, if the equation (90) hasa 
solution, then all the characteristic values of A are different from zero, 
and A is non-singular (| 4 | 0). Thus, the condition | A | 40 is necessary 
for the existence of solutions of the equation (90). Below, we shall see that 
this condition is also sufftivient. 

Suppose, then, that | A |>.0. We write down the elementary divisors 
of A: 


(A — A)", (A—A,)?, . 2, (A—A,)?0 
(AyAg 6 AMO, Dy t Py tees + p=). (91) 


Corresponding to these elementary divisors we reduce d to the Jordan 
normal! form: 


A =UAU" 
-——U { A,B + H'?0), Ag Ps) + Hs), sds A,B) ate Hr) } U-1, (92) 


Since the derivative of the function e* is different from zero for all 
values of &, we know (see Chapter VI, p. 158) that in the transition from X 
to A = e# the elementary divisors do not split, so that .Y has the elementary 
divisors 


(A ~~ E,)”, (A — §5)?s, ee (a = é,,)?* » (93) 


where e=-4, (7=1.2,.... uw), ie., &; is one of the values of In A; (j= 1, 2, 
3,..., 4). nue ek 

In the plane of the complex variable 4 we draw a circle with center at 2; 
and with radius less than | 2; | and we denote by f,(A) =In 24 that branch of 
the function In 4 in this circle which at 4; assumes the value equai to the 
characteristic value €; of X (7=1,2,...,u). After this, we set: 


In ( A, EB?) 4. A) = i, (A j EO) + H)) = In A, EOP) A;" HOP) 4 oes, (94) 


Since the clerivative of In A vanishes nowhere (in the finite part of the 
A-plane), the matrix (94) has only the one elementary divisor (4 — E5)P%. 
Therefore the quasi-diagonal matrix 


(In (A,B) + H), la (A,B) 4+ H@)), ..., ln (A, Heo) + Hiv) } (95) 


has the same elementary divisors as the unknown matrix X. ‘Therefore there 
exists a matrix T (| T | 0) such that 


§ 8. Tur Logarrrum or A MatTRIx 241 


X=T {ln (a, Heo + H®)), ..., In(A, Be) + Hw) } 7-1. (96) 
In order to determine 7, we note that 
A=er=T { A, Ev + Hi) ..., Ata?) + H{(Pu) } Tl, (97) 


Comparing (97) and (92), we find: 

T=UX-, (98) 
where XZ is an arbitrary matrix that commutes with A. Substituting the 
expression for J from (98) into (96), we obtain a general formula that 
comprises all the Jogarithms of the matrix: 

X=UXz {In (A, HP + HP), 
In (A,B) + A), ..., In (A,B) + HOo)}X2U-2, (99) 
Note. If all the elementary divisors of A are co-prime, then on the 


right-hand sidé of (99) the factors XZ and Xz can be omitted (see a similar 
remark on p. 233). 


CHAPTER IX 


LINEAR OPERATORS IN A UNITARY SPACE 


§ 1. General Considerations 


In Chapters III] and VII we studied linear operators in an arbitrary 
n-dimensional vector space. All the bases of such a space are of equal stand- 
ing. To a given linear operator there corresponds in each basis a certain 
matrix. The matrices corresponding to one and the same operator in the 
various bases are similar. Thus, the study of linear operators in an n-dimen- 
sional vector space enables us to bring out those properties of matrices that 
are inherent in an entire class of similar matrices. 

At the beginning of this chapter we shall introduce a metric into an 
n-dimensional space by assigning in a special way to each pair of vectors 
a certain number, the ‘scalar product’ of the two vectors. By means of the 
scalar product we shall define the ‘length’ of a vector and the cosine of the 
‘angle’ between two vectors. This metrization leads to a unitary space 
if the ground field F is the field of all complex numbers and to a euclidean 
space if F is the field of all real numbers. 

In the present chapter we shall study the properties of linear operators 
thet are connected with the metric of the space. All the bases of the space 
are by no means of equal standing with respect to the metric. However, 
this does hold true of all orthonormal bases. The transition from one ortho- 
normal basis to another in a unitary space is brought about by means of 
a special—namely, unitary—transformation (in a euclidean space, an or- 
thogonal transformation). Therefore all the matrices that correspond to one 
and the same linear operator in two distinct bases of a unitary (euclidean) 
space are unitarily (orthogonally) similar. Thus, by studying linear opera- 
tors in an n-dimensional metrized space we study the properties of matrices 
that remain invariant under transition from a given matrix to a unitarily— 
or orthogonaliy—similar one. This will lead in a natural way to the investi- 
gation of properties of special classes of matrices (normal, hermitian, uni- 
tary, symmetric, skew-symmetric, orthogonal matrices). 


242 


§ 2. METRIZATION OR A SPACE 243 


§ 2. Metrization of a Space 


1. We consider a vector space R over the field of complex numbers. To 
every pair of vectors x and y of R given in a definite order let a certain 
complex number be assigned, the so-called scalar product, or inner product, 
of the vecturs, denoted by (xy) or (x,y). Suppose further that the ‘scalar 
multiplication’ has the following properties: 

For arbitrary vectors x, y, s of R and an arbitrary complex number ag, let? 


1. ‘(*Y) = (Y%), 
2, (ae, ¥) =a (xy), (1) 
3. (e+ y, 3) =(%z) + (¥2). 


Then we shall say that a hermitian metric is introduced in R. 

Note that 1., 2., and 3. have the following consequences for arbitrary 

x,y,2i0R: 
2’. (*, ay) =a (xy), 
3’. (x, ¥ +3)= (ay) + (¥5). 

From 1. we deduce that for every vector x the scalar product (x) isa 
real number. This number is called the norm of x and is denoted by Nx : 
Nx = (4, x). 

If for every vector x of R 


4. Nx =(ax2) 20, (2) 


then the hermitian metric is called positive semi-definite. And if, more- 
over, 


5. Nx=(xx) >0forzo, (3) 
then the hermitian metric is called positive defimte. 


DEFINITION 1: A vector space R with a positive-definite hermitian metric 
will be called a unitary space.” 
In this chapter we shall consider finite-dimensional unitary spaces.* 


By the length of the vector x we mean‘ J Nx = +) (x,x)= |x|. From 2. 
and 5. it follows that every vector other than the null vector has a positive 


1 A number with a bar over it denotes the complex conjugate of the number. 

2 The study of n-dimensional vector spaces with an arbitrary (not positive-definite) 
metric is taken up in the paper [319]. 

3 In §§ 2-7 of this chapter, wherever it is not expressly stated that the space is finite- 
dimensional, all the arguments remain valid for infinite-dimensional spaces. 

4The symbol .V denotes the non-negative (arithmetical) value of the root. 


244 IX. Linear OPERATORS IN A UNITARY SPACE 


length and that the null vector has length 0. A vector x 1s called normalized 
(or is said to be a unit vector) if |x |=1. To normalize an arbitrary vector 
xo ‘ is sufficient to multiply it by any complex number 4 for whach 


By analogy with the ordinary three-dimensional vector spaces, two vectors 
x and y are called orthogonal (in symbols: x 1 y) if (xy) =Q. In this case 
it follows from 1., 3., and 3’. that 


N(e+y)=(4+9,%+4)= (2%) + (yy) =Nx+Ny, 
i.e. (the theorem of Pythagoras!), 
jz+yP=|xP+l|y? (x 1y). 


Let R be a unitary space of finite dimension n. We consider an arbitrary 
basis e€;, €2,..., e&, of R. Let us denote by 2; and y; ({=1, 2 ..., n) the 
coordinates of the vectors x and y in this basis: 


n n 
B= 2 '2C;. y= De 
to: ta: 


Then by 2., 3., 2’., and 3’., 


a 
(xy) = 2 aris > (4) 
where 
hy, —(e,e,) (+,k=1, 2, ...,%). (5) 
In particular, 2 
= (xx)= DS" hyxey. (6) 
{, kal 
From 1. and (5) we deduce 
- hy=hy, (+, k=1, 2, nN). (7) 
2. A form > hy 2,2,, where hy = hix ee k= a 2, ..., ”) 1s called her- 


t,k=m1 
mitian.. Thus, the norm of a vector, i.e., the square of its length, is a her- 
mitian form in its coordinates. Hence the name ‘hermitian metric.” The 
form on the right-hand side of (6) is, by 4., non-negative : 


" 
D> at z, 20 (8) 


1, kml 


for all values of the variables x1, xo, ..., Zn. By the additional condition 5., 


the form isin fact positive definite, 1.e., ue equality sign in (8) only holds 
when all the 2, are zero (1=1, 2,. <a: 


5 In accordance with this, the expression on the right-hand side of (4) is called a 
hermitian bilinear form (in 21, 22,...,%n aNd Y1, Y2,-.+,Yn). 


§ 2. MerrizaTion or a SPACE 245 
DEFINITION 2: A system of vectors e1, €2,..., @m is called orthonormal if 


0, for ik, | 
ee a (9) 


When m =n, where n is the dimension of the space, we obtain an orthonormal 
basis of the space. 

In § 7 we shall prove that every n-dimensional space has an orthonormal 
basis. 

Let x, and y; (¢=1, 2,..., n) be the coordinates of x and y in an ortho- 
normal basis.. Then by (4), (5), and (9) 


(e,e,)=8,= 


(xy) = Say, 
ee (10) 


Nx = (xx) = > | a,| 3. 


Let us take an arbitrary fixed basis in an n-dimensional space R. In this 


basis every metrization of the space is connected with a certain positive- 
n 


definite hermitian form PH hint,Z,; and conversely, by (4), every such 
ikem 


form determines a certain positive-definite hermitian metric in R. How- 
ever, these metrics do not all give essentially different unitary n-dimensional 
spaces. For let us take two such metrics with the respective scalar products 
(xy) and (xy)’. We determine orthonormal bases in R with respect to 
these metrices: e; and e,' (¢=1,2,...,”). Let the vector x in R be mapped 
onto the vector x’ in R, where x’ is the vector whose coordinates in the basis 
e/ are the samé as the coordinates of x in the basis eo, (t= 1, 2,..., ). 
(x—>x’.) This mapping is affine.* Moreover, by (10), | 


(xy) = (x’y’)’. 
Therefore: To within an affine transformation of the space all positwe 
definite hermitian metrizations of an n-dimensional vector space cowncide. 


If the field F is the field of real numbers, then a metric satisfying the 
postulates 1., 2., 3., 4., and 5. is called euclidean. 


DEFINITION 3: A vector space R over the field of real numbers with a 
positive euclidean metric ts called a euclidean space. 


If z, and y, (t=1, 2,..., 7) are the coordinates of the vectors x and y in 
some basis €1, €2,..., &, of an n-dimensional euclidean space, then 


6 J.e., the operator A that maps the vector x of R onto the vector x’ of R’ is linear 
and non-s'1.gular. 


946 IX. Linear Operators in a UNITARY SPACE 


n n 
(xy) = D' sary, Nx= |x? = a tae ta 
f, kan] i, kml 
Here SxS (14, k=1, 2, ..., n) are real numbers.’ The expression 
” 


>) Sixt 2 is called a quadratic form in x1, X2,..., Zn. From the fact that the 
t, kml n 


metric is positive definite it follows that the quadratic form _>” sixzrirn. 
ira 
n 


which gives this metric analytically, is positeve definite, i.e., Ps SinUily > O 


: n kml 
if >’ a> 0. 
¢(=1 
In an orthonormal basis 
n : n 
(xy) = Sayy,, Nx=|x/? = Dd’ a7}. (11) 
aim tel 


For n= 3 we obtain the well-known formulas for the scalar product of 
two vectors and for the square of the length of a vector in a three-dimensional 
euclidean space. 


§ 3. Gram’s Criterion for Linear Dependence of Vectors 


1. Suppose tnat the vectors x1, xo. ..., x», of 2 unitary or of a euclidean 


space R are linearly dependent, i.e., that there exist numbers® ¢}, Co, ... 5, Cm 
not all zero, such that 


When we perform the scalar multiplication by x1, Xo, ..., X%m In succession 
on both sides of this equation, we obtain 


(%%,) Cy + (HX) Cy +++ + (%1%X_) Cy =O 
(%y%q) Cy + (%_%—) Cg + +++ + (X_%Xq) Cy =O 


oo e@ @ @© @ @ @ $e oe © ew ew ew ew eo eo we 


(Xp %q) Cy + (Xp Xq) 6 20+ + (%p_Xeq) Cy =O. 


(13) 


Regarding ¢,, Ce,---, C, aS a non-zero solution of the system (13) of linear 
homogeneous equations with the determinant 


7 Six = (evex) <t,kK= 1,23... »n). 


8In the ca" of a euclidean space, Ci, C2,...,Cm are real numbers, 


§ 3. GraMm’s CRITERION FoR LINEAR DEPENDENCE 247 


(%,%,) (2% ,%q) ..- (x,%,,) 
(%_%,) 


e * @#+e 8 &® @ e e e e 


G (%1, %, ..., ¥,)= (14) 


(¥,,%3) (X_%) eee (Xp,% mq) 
we conclude that this determinant must vanish: 
G (%1, %, ..., X%,)=9. 


G (%1, %2,..-,%m) 1s called the Gramian of the vectors %1, %2,..., Xm. 
Suppose, conversely, that the Gramian (14) is zero. Then the system of 
equations (13) has a non-zero solution ¢,, ¢y,...,¢,- Equations (13) can 
be writter as follows: 
(5, Cy%y + CyX_ toes + 6, %_)= 0 
ee as (3 
(Xeqy Cy®y + Cog to + Cy Xp) =O. 


Multiplying these equations by ¢1, C2, ..., Cm respectively, and then add- 
ing, we obtain: 
N (6,%1 + Cg¥q +2 ** + Cg Xm) = 9; 


and since the metric is positive definite 
C1% + Oo%q + 29° + CX, — O, 


ie., the vectors %1, X2,..., %m are linearly dependent. 
Thus we have proved: 


THeorem 1: The vectors x1, X2,..., Xm ure linearly independent if and 
only if their Gramian ts not equal to zero. 


We note the following property of the Gramian: 
If any principal minor of the Gramian is zero, then the Gramian +s zero, 


For a principal minor is the Gramian of part of the vectors. When this 
principal minor vanishes, it follows that these vectors are linearly dependent 
and then the whole system of vectors'is dependent. 


2. Example. Let f,(t), fe(t), ...)fn(t) be n complex functions of a real 
argument ¢t, sectionally continuous in the closed interval [a,f]. It is 
required to determine conditions under which they are linearly dependent. 
For this purpose, we introduce a positive-definite metric into the space of 
functions sectionally continuous in [a, £8} by setting 


248 TX. LINEAR OPerRaTorRS IN A UNITARY SPACE 


B 
(fF d=ft a) de. 


Then Gram’s criterion (Theorem 1) applied to the given function yields 
the required condition : 


A os B sees 
[hOn@d ... fAOh@ de 


B B 
fia @adt -... f tain Gat 


§ 4. Orthogonal Projection 


1. Let x be an arbitrary vector in a unitary or euclidean space R and S an 
m-dimensional subspace with a basis x1, X2,..., Xm. We shall show that x 
can be represented (and moreover, represented uniquely) in the form 


XX, + ty, (15) 
where 
xgeS and xy1S 


(the symbol 1 denotes orthogonality of vectors; orthogonality to a subspace 
means orthogonality to every vector of the subspace) ; x, is the orthogonal 
projection of x onto S, xx the projecting vector. 

Example. Let R be a three-dimen- 
sional euclidean vector space and 
m=2. Let all vectors originate at a 
fixed point O. Then S is a plane pass- 
ing through O; xg is the orthogonal 
projection of x onto the plane S; xy 
is the perpendicular dropped from the 
endpoint of x onto the plane S (Fig. 
5); and h=| xy | is the distance of the 
endpoint of x from S. 

To establish the decomposition (15), we represent the required xs in the 


as 


Fig. 5 


form 
Kg = C,%, + Co%_g + °° + 6,,%,,, (16) 
where ¢1, C2, --- Cm are eomplex numbers.® 


cere Stare ese 
9 In the case of a euclidean space, c:, ¢:,...,Cm are real numbers. 


§ 4. OrtHogonaL PROJECTION 249 
To determine these numbers we shall start from the relations 
(x—xs, x,)=0 (K=1, 2, ..., m). (17) 
When we substitute in (17) for x, its expression (16), we obtain: 


(4X1) Cy + 20+ + (%,,%,) C6, +(%%,)-(—1)=—0 
(#1 %,,) he il (%,%m) Cm +(#%,_) -(—1)=0 
ByCy teers Knlm t Xs*(—1) =e. 


(18) 


Regarding this as a system of linear homogeneous equations with the 
non-zero solution Cj, C2, ..., Cmy — 1, we equate the determinant of the system 
to zero and obtain (after transposition with respect to the main diagonal) :*° 


(% 1%)... (%%X,,) %4 
Be cisice tee 2 ie —7 (19) 
(%,,%) eee (X_4%m) Xm 
(xx)... (%%X,,) Xs 


When we separate from this determinant the term containing xs, we 
obtain (in a readily understandable notation) : 
*) 


G 


(xs) .-- (xm) 0 
“;= — Sea) 1) Ot (20) 
where G = G (x1, %2,..., ¥m) is the Gramian of the vectors x), %2, ..., Xm 


(in virtue of the linear independence of these vectors, G £0). From (15) 
and (20), we find: 


*1 
a : 
xm 
ky =X —Xs me as Ee issu) FI ale (21) 


10 The determinant on the left-hand side of (19) is a vector whose i-th coordinate is 


obtained by replacing all the vectors x,...,%m, ¥g in the last column by their i-th 
coordinates (1—1,2,...,”); the coordinates are taken in an arbitrary basis. To 
justify the transition from (18) to (19), it is sufficient to replace the vectors x1,..., xm, 


g by their i-th coordinates. 


950 1X. Linear Operators in a Unitary SPAcE 


The formulas (20) and (21) express the projection x, of x onto the sub- 
space S and the projecting vector x, in terms of the given vector x and the 
basis of S. 


2. We draw attention to another important formula. We denote by h 
the length of the vector x,. Then, by (15) and (21), 
(x2) 
, : 
é (Xm) 
= --+ (xx%m)  ( 
h?= (Xp Ry) = (eye) = A En) a. ) ei, 
1.e., 
G (x1; Ken oves & » x) 
2— 1 3 ™ 
h G (x1, Eqn occ Xm) (22) 
The quantity h can also be interpreted im the following way: 
Let the vectors x1, Xo, ..., Xm, X issue from a single pont and construct 
on these vectors as edges an (m+ i)-dimensional parallelepiped. Then h 
is the height of this parallelepiped measured from the end of the edge x to 
the base S that passes through the edges x1, %2,...,%m- 


Let y be an arbitrary vector of S and x an arbitrary vector of R. If all 
vectors start from the origin of coordinates of an n-dimensional point space, 
then | x — y | and | x — ss | are equal to the value of the slant height and the 
height respectively from the endpoint of x to the hyperplane S.”* Therefore, 
when we set down that the height is shorter than the slant height, we have :” 


h=|x—xs|S|x—y| 


(with equality only for y=x,). Thus, among all vectors y « S the vector 
xs deviates the least from the given vector xe R. The. quantity h= 
JN (x — xs) is the mean-square error in the approximation x ~ %.,.*° 


§ 5. The Geometrical Meaning of the Gramian and Some Inequalities 


1. We consider arbitrary vectors x1, %2,..., %m. Let us assume, to begin 
with, that they are linearly independent. In this case the Gramian formed 
from any of these vectors is different from zero, Then, when we set, in 
accordance with (22), 


11 See the example on p. 248. 

12 N (x — y)= N (xy + x, — y)= Nay + N(x, —¥) | N (aq) = MH. 

13 Ag regards the application of metrized functional spaces to problems of approxima- 
tion of functions, see [1]. 


§ 5. GzomerricaL MEANING OF THE GRAMIAN 251 
G (x, Xoo .++s Xp4i) __ ps ns 
“Gata tae ee (p =1, 2, -.+, m—1}), (23) 
and multiply these inequalities and the inequality 


G (x1) = (#141) > 0, (24) 
we obtain 


G (41, X2,..., Xm) > 0. 


Thus: The Gramian of linearly independent vectors 1s positive; that of 
linearly dependent vectors is zero. Negative Gramians do not exist. 

Let us use the abbreviation G,= G(x1, %2,..., X%») (p=1, 2,..:, m). 
Then, from (23) and (24), we have 


VGi =| a1 |= V1, 
V Ge= Vi hy = Va, 

where V2 is the area of the parallelogram spanned by x, and x2. Further, 
V Gs = Vohe = Vs, 


where V’; is the volume of the parallelepiped spanned by x1, x2, x3. Continu- 
ing further, we find: 


VG,=Vsha= Vs 
and, in general, = 
VEn= Va—1hm—1 = Ven. (25) 


It is natural to call V,, the volume of the m-dimensional parallelepiped 
spanned by the vectors 1, %2,..., %m."* 

We denote by 21%, Tox, ..-, Lnx the coordinates of x, (kK =1, 2,..., m) in 
an orthonormal basis of R and set 


X=||zyl| (t= 1, 2,...,m; R=1,2,...,m). 


Then, in consequence of (10), 


Gi, == |XX | 
and therefore (see formula (25)), 
Zi. Ms 25m . 
Vi 6,:> Da mod | at 7a +> Fyn | . (26) 
lsti<Qce-cincn | www ee ee eee 
Ter Vas x, 


14 Formula (25) gives an inductive definition of the volume of an m-dimensional 
parallelepiped. 


252 IX. Linear Operators IN A Unitary SPACE 


This equation has the following geometric meaning: 


The square of the volume of a parallelepiped 1s equal to the sum of the 
squares of the volumes of its projections on all the m-dimensional coordinate 
subspaces. In particular, for m= n, it follows from (26) that 


11 «2 Tis 
2 x saa 

V,— mod |°* “33 is, (27) 
Zui Zao as 


The formulas (20), (21), (22), (26), and (27) solve a number of funda- 
mental metrical problems of n-dimensional unitary and n-dimensional] 
euclidean analytical geometry. 


2. Let us return to the decomposition (15). This has the immediate 
consequence : 


(xx) = (%5 + Hy, pt Xy)— (Xs, Ts) + (Xn, XN) S (XnXN)— fh’, 


which, in conjunction with (22), gives an inequality (for arbitrary vectors 
%1, %2,..., Xm, xX) 


G(x;, Xo, cory Xm x) < G(x), Xo, ooey X,) G (2) ; (28 


the equality sign holds if and only if x is orthogonal to x), ¥2,..., Xm. 
From this we easily obtain the so-called Hadamard inequality 


G(x,, Xe; oes Xn) S G(x,) G(x.) eer g G (%,,) , (29) 


where the equality sign holds if and only if the vectors x, x2. ..., %m are 
pairwise orthogonal. The inequality (29) expresses the following fact, 
which is geometrically obvious: 


The volume of a parallelepiped does not exceed the product of the lengths 
of tts edges and 1s equal to it only when the parallelepiped is rectangular. 


Hadamard’s inequality can be put into its usual form by setting m=n 
in (29) and introducing the determinant 4 formed from the coordinates 


Lik, Loe,» +) Tnx Of the vectors x, (kK =1,2,..., 2) in some orthonormal basis: 
yy eee Lin 
A= 
Tay » Lan 


§ 5. GromeEtTRIcAL MEANING oF THE GRAMIAN 253 


Then it follows from (27) and (29) that 
| 4? 2 | 2a Pm iF [Rs+- S| a,,[%. (29’) 
t= = ton] 
3.18 We now turn to the inequality 


G (X15, Fas, +++) Bums) SG (Hy, Xq, ..., Sy) (30) 


If G(x1, Xo, ..-, Xm) 0, then the equality sign holds in (30) if and only if 
tin =0 (4=1,2,..., m). If G( 41, %2,..., Xm) =0, then (30) implies, of 
course, that G(x1s, Xes,..-, ms) =O. 

In virtue of (25), the inequality (30) expresses the following geometric 
fact. - 2s pee ae BT AS 

The volume of the orthogonal projection of a parallelepipep onto a sub- 
space § does not exceed the volume of the given parallelepiped ; these volumes 
are equal if and only if the projecting parallelepiped lies in S or has zero 
volume. 

We prove (30) by induction on i. 

The first step (m= 1) is trivial and yields the inequality 


G(x,5) =G4 (x), 


ie., | x19 | S| x1| (see Fig. 5 on page 248). 

We write the volume VG (Hy, Xo) 0) Xm) of our parallelepiped as the 
product of the ‘base’ j G (%,, %o,..-+,%m_3) by the distance h of the vertex 
of x» from the base: 


VG (x, Boy epee) ey (x, Xo; Peis Mee) (31) 


If we now go over on the left-hand side of (31) from the vectors x, to 
their projections x5 (t=1, 2,..., m), then the first factor cannot increase, 
by the induction hypothesis, nor the second, by a simple geometric argument. 
But the product so obtained is the volume VG (x15, Xog. +++) Xmg) Of the paral- 
lelepiped projected onto the subspace S. Hence 


G18) %2g)-- -» Xms) Ss /G@ (x1, Xo, eee 


and by squaring both sides, we obtain (30). 
Our condition for the equality sign to hold follows immediately from the 
proof. | | i 


15 Subsections 3 and 4 have been modified in accordance with a correction published 
by the author.in 1954 (Uspehi Mat. Nauk, vol. 9, no. 3). 


954 IX. bLryecar Operators in a Unitary SPACE 


4. Now we shall esta*lish a generalization of Hadamard’s inequality which 
comprises both the inequalities (28) and (29) : 


G (%,, Xo, eee 3 X,,) <G(x,, coast) x,) G(X p41) ae Xm) (32) 
where the equality sign holds if and only if each vector x1, X2,..., Xp is 
orthogonal to each of the vectors xp41, ..., Xm or one of the determinants 


G(%1, X02) +++) Xp)» F(4p41,.--,%m) Vanishes. 

The inequality (32) has the following geometric meaning: 

The volume of a parallepiped does not exceed the product of the volumes 
of two complementary ‘faces’ and is equal to this product tf and only uf these 
faces are orthogonal or at least one of them has volume zero. 


Let us prove the inequality (32). Letp <m. If G(aj, x2,..., Xp) =0, 
then (32) holds with the equality sign. Let G(x1, x2, ...,%)) #0. Then 
the p vectors %1, X2, ..., Xp are linearly independent and form a basis 
of a p-dimensional subspace T of R. The set of all vectors y of R that 
are orthogonal to T are easily seen also to form a subspace of R (the 
so-called orthogonal complement of T; for details, see § 8 of this Chapter). 
We denote it by S, and then R=T+S. 

Since every vector of S is orthogonal to every vector of T, we can go over, 


in the Gramian G(x, X2,....%Xm), whose square represents a certain volume, 
from the vectors x)41, ...,%m to their projections %p415,..., Xms onto the 
subspace S: 


G(%q, 6665 Bpr Sppys oor Bm) = GF (Hy «oy ps Mpg is» oes Xeag)- 


The same arguments show that the Gramian on the right-hand side of this 
equation can be split: 


G (x1, ae ee Xp *n+158> a8 Xms)= G (x, ae | Xp) G (Xpr155 a os Xms)- 


If we now go back from the projections to the original vectors and use (30), 
then we obtain _ 


G (xy, « +) Bp) GF (Xpp 1556+ Kms) SG (Ry, . . «5 By) G (Kpyr, . . +s Bq). 


The equality sign holds in two cases: 1. When G(xp41, ..., X%my =9, for 
then it is obvious that G(%p415,...,%Xms) =0; and 2. When x5 = x; (2 = 1, 2, 
3,..., m), ie., when the vectors x11, ..., Xm belong to S or, what is the 
game, each vector %)41,..., Xm is orthogonal to every vector x1, X2,..., Xp» 
(the case G(2%1, X2,.... Xp) =0 has been considered at the beginning of the 
proof). By combining the last three relations we obtain the generalized 
Hadamard inequality (32) and the conditions for the equality sign to hold. 
This completes the proof. 


§ 5. GEOMETRICAL MEANING OF THE GRAMIAN 205 


5. The generalized Hadamard inequality (32) can also be put into analytic 
form. 
nr 
Let DS hixXiE, be an arbitrary positive-definite hermitian form. By 
ikl 
regarding 21, Xo,..., Z, as the coordinates, in a basis e), e2,..., €n, of a vector 


1] 
x in an n-dimensional space R, we take 3’ hi,.2iF; as the fundamental 


{kal 
metric form of R (see p. 244). Then R becomes a unitary space. We apply 
the generalized Hadamard inequality to the basis vectors e;, €2,...,. €n: 


G(e,, eo, te 29 en) =< G(e,, ae €,)G(en41, ce e,,) : 


Setting H = | hi ili and noting that (e,e,) =hiy, %,k=1,2,...,2), we 
can rewrite the latter inequality as follows: 


Eee 1 2...p\ (pt+1...m 
= ’ H <n). (33 
f 2 aS: f “ped ee ay) (33) 


Here the equality sign holds if and only if hy,=hy=0 ((=1, 2,..., p; 
k= ptl,...,n). 

The inequality (33) holds for the coefficient matrix H =| hix ik of an 
arbitrary positive-definite hermitian form. In particular, (33) holds 
if H is the real coefficient matrix of a positive-definite quadratic form 


n 
>» hy x; ty .18 


t, Kal] 
6. We remind the reader of Schwarz’s inequality :¢ 


For arbitrary vectors x,yeR 
(xy) PSNaNy, (34) 


and the equality sign holds only if the vectors x and y differ only by a scalar 
factor 
The validity of Schwarz’s inequality follows easily from the inequality 
‘established above ) 


(xx) (*Y)|. 


G (x, y)= = 
9) (yx) (xy) 


By analogy with the scalar product of vectors in a three-dimensional 
euclidean space, we can introduce in an »-dimensional unitary space the 


16 An analytical approach to the generalized Hadamard inequality can be found in 
he book [17], § 8. 


+ In the Russian literature, this is known as Bunyakovskii’s inequality. 


956 TX. Linear Operators 1x a Unitary SPACE 


‘angle’ 6 between the vectors x and y by defining" 
26 = (xy) i 
cos? 6 Lox ; 


From Schwarz’s inequality it follows that 0 is real. 


§ 6. Orthogonalization of a Sequence of Vectors 


1. The smallest subspace containing the vectors x1, %2, ..., xX, will be de- 
noted by [%,,%2, ---,%,]. This subspace consists of all possible linear 
. . J 
combinations ¢, X, + Ce%2,+°**+¢,%, of the vectors 2, %2,..., (C1, C2, 
C3,..., Cp are complex numbers.)'® If x,, %2,...,%,are linearly independ- 
ent, then they form a basis of [%,,%,,...,%,]. In that case, the subspace 
is »-dimensional. 
Two sequences of vectors 


MK: Hy, Bq, +s 
Y: Vi Xor--> 


containing an equal number of vectors, finite or infinite, will be called 
equivalent if for all p 


[%y, Xq, -- +) Mol =[H1, Yar -- +s Vp) (p=1, 2,...). 


A sequence of vectors 
XxX: *, ad) eee 


will be called non-degenerate if for every p the vectors x, X2,..., X, are 
linearly independent. 

A sequence of vectors is called orthogonal if any two vectors of the 
sequence are orthogonal. 

By orthogonalization of a sequence of vectors we mean a process of re- 
placing the sequence by an equivalent orthogonal sequence. 


THEOREM 2: Every non-degenerate sequence of vectors can be orthogona- 
lized. The orthogonalizing process leads to vectors that are uniquely deter- 
mined to within scalar multiples. 


17 In the case of a euclidean space, the angle 8 between the vectors x and y is defined 


by the formula 
—._ (xy) 
lz] ly| 
18 In the case of a euclidean space, these numbers are real. 


cos 6 


§ 6. ORTHOGONALIZATION OF SEQUENCE OF VECTORS 207 


Proof. 1) Let us prove the second part of the theorem first. Suppose 
that two orthogonalizing sequences ¥;,¥2,...(Y)andz,,2,, ... (Z) are equi- 
valent to one and the same non-degenerate sequence x,, x,,...(X). Then 
Y and Z are equivalent to each other. Therefore for every p there exist 
numbers ¢€,1, Cye,+++»Cpp Such that 


Bp = CoyVy + CoeVg ee + Cy pV pa + Cop) p (p= 1, 7 aes eis 


When we form the scalar products of both sides of this equation by 
X1, Ye,..., Yp-1 and take account of the orthogonality of Y and of the 
relation 


%, 1 [2 y, Bg, .--, By =[Np, Nar - 02> Nps 
we obtain ¢,1= C,2 =++:= ¢,5_1= 0, and therefore 
B= CoV p (p= 1, 2,...). 


2) A concrete form of the orthogonalizing process for an arbitrary non- 
degenerate sequence of_vectors %;, 2, ... (X) is given by the following 
construction. 

Let 


S,=[*, % ---,%p], G,= G(x, %y,...,%,) (p=1, 2,...). 
We project the vector x, orthogonally cnto the subspace S,_, (p=1,2,...) :?° 
%)= ps, +X pm Xs, Sy» %n~ L Sy) (p=1, 2,...). 
We set 
Vp =A Xpn (p= 1, 2,...5 X%y =), 
where A, ( p=1, 2,...) are arbitrary non-zero numbers. 
Then it is easily seen that 
Be iV es.243 


is an orthogonal sequence equivalent to X. This proves Theorem 2. 
By (21) 


sad | 
G 
Xp-1 
By y =e Cotes) Ae (p= 1, 2,...5 G=)). 


— 


1° For p==1 we set ¥1g, = 9 and x,y =n 


258 1X. Linear Operators in A Unitary SPACE 


Setting 24, = Gp-1 (p =], 2, ... ; Go = 1), we obtain the following for- 
mulas for the vectors of the orthogonalized sequence: 


(%,%,) (%;%,_3) *y 
(5%) By) J ee cee ewe nee 

= &,, = peers as wiewalao 
Y= A Ya (%_X1) Xe Yp (1%)... (%,_1%_1) % py 

(%,%,) ...(*%,%,1) 4%, 

By (22), 
G 

Ny,= G3Na,n0 = pH 1" Ge =G,_14, (p= 1, 2,...; G=1). (36) 


Therefore, setting 


et OP = 
, ie (p=1, 2,...), (37) 
P  V6p1G, 

we obtain an orthogonal sequence Z equivalent to the given sequence X. 

Example. In the space of real functions that are sectionally continuous 
in the interval [— 1, + 1], we define the scalar product 

+1 
(f,9) = [f(a)g(a)dz. 
-1 
We consider the non-degenerate sequence of ‘vectors’ 


2 3 
f Fe! are gre are 


We orthogonalize this sequence by the formulas (35) : 


> 0 + 0 = 0...1 
jo > 0 = 0 weet 
Yo=1, Yn= , 1 i (m=1, 2,...). 
oF OT OL et | 
Sst aesg, tanattests 


These orthogonal polynomials coincide, apart from constant factors, with 
the well-known Legeridre polynomials :*° 


1 d™(x*—1)™ 
Po(z) =1,Pal(z) = ej (m=1,2,...). 
The same sequence of powers 1, x, z?,... in a different metric 


ee ene: 
20 See [12], p- 77ff. 


§ 6. ORTHOGONALIZATION OF SEQUENCE OF VECTORS 259 


.) 
(f,g) = [f(2)g(z)t(2)dz 


(where t(z) 20 for aXx=b) gives another sequence of orthogonal 
polynomials. 


1 
For example, if a=—1, b=1 and t(z) = jis , then we obtain the 
Tehebyshev (Chebys**v) polynomials: o-% 
— 1 
T,(2) = grat COS (n arccos x) 
For a=— 0, b=+ o and t(x) =e we obtain the hermitian poly- 


nomials, ete.?! 


2. We shall now take note of the so-called Bessel inequality for an ortho- 
normal sequence of vectors z,, %,...(Z). Let x be an arbitrary vector. 
We denote by &, the projection of x onto 2,: 


E,=(x2,)  (p=1,2,...). 


Then the projection of x onto the subspace S, = [21, 22, ..., 3p] can be repre- 
sented in the form (see (20) ) 


Hg HbR t Seg tees + bye (p =1, 2,...). 
But Nx 5, =|, P+/4/?+---+ /é)?< Nx. Therefore, for every p, 
[f,P+|e:P +--+ ]é SNe. (38) 


This is Bessel’s inequality. 

In the case of a space of finite dimension n, this inequality has a com- 
pletely obvious geometrical meaning. For p—=n it goes over into the theorem 
of Pythagoras 


[FP + [eeP +--+ + [oP =P. 
‘ In the case or an infinite-dimensional space and an infinite sequence Z, 


it follows from (38) that the series >| , |? converges and that 
kel 


> ||? sNx=|2). 
kal] 
Let us form the series 


21 Wor further details see [12], Chapter II, §9.. 


260 TX. Linear OPERATORS IN A UNITARY SPACE 


> bx: 


kal 
For every p the p-th partial sum of this series, 


$43, + &.%.+-+++ &,m,, 


is the projection xs, of x onto the subspace 
Sp = [21, Bq, ..-, Bel 
and is therefore the best approximation to the vector x in this subspace: 
Pp Pp 
N(x— 2'6%) <N(x— 3’ e,2,). 
k= k=] 


where ¢1, C2, ..., Cp are arbitrary complex numbers. Let us calculate the 
corresponding mean-square-deviation d,: 


p p p p 
65=N(x— > §%)= (x — Dy S42: al Ds €,%,) = Nx — | fy Ps 
inl kel Pans tml 
Hence 
oo 
lim 65 = Nx — > |b ie 
p-~>0o k=l 
If 
lim 6,=0, 
Pp--co 
then we.say that the series _¥’ ¢, 2, converges in the mean (or converges with 
i=l 


respect to the norm) to the vector «x. 
In this case we have an equality for the vector x in R ( the theorem of 
Pythagoras in an infinite-dimensional space!): 


co 


Ne =|2/)= 1/4) (39) 


kal 


f-°] 
If for every vector x of R the series _3* £, 2, converges in the mean to x, 
kel 


then the orthonormal sequence of vectors 2;, 22, ... is called complete. In 
this case, when we replace x in (39) by x + y and use (39) three times, for 
N(x +y), Nax,-and Ny, then we easily obtain: 


(ay) = Sete [Ee = (ey), = (a) F=1, 2-1. (40) 


§ 6. ORTHOGONALIZATION OF SEQUENCE OF VECTORS 261 


Example. We tonsider the space of all complex functions f(t¢) (¢ is a 
real variable) that are sectionally continuous in the closed interval [0, 27]. 
Let us define the norm of f(t) by 


Sn 
Nf= f{ |f( ae. 
oO 


Correspondingly, we have the formula 
2% ee 
(,9)=f{ tg (tat 
0 


for the scalar product of two functions f(t) and g(t). 
We take the infinite sequence of functions 


| 
Yar’ (k= 0,.+ 1,+ 2, ves)e 


These functions form an orthogonal sequence, because 


fn Qn 
[dete dt = f ef ude de = ee oa 
6 5 | 2x, for u=~». 


The series 
co 1 2n : = 
» fe" (ise [oem (k=0, +1, +2, s) 
tm—oo Fi 


converges in the mean to f(¢) in the interval [0,22]. This series is called 
the Fourser serves of f(t) and the coefficients f;, (k =0, +1, +2?,...) are 
called the Fourier coefficients of f(t). 

In the theory of Fourier series it is proved that the system of functions 
e# (k=0, +1, + 2,...) is complete.” 

The condition of completeness gives Parseval’s equality (see (40) ) 


2x +00 2x 2x 

a= 1 ee 
[tor@a= = ag | feat fg emae. 
o at 0 0 


If f(t) is a real function, then f, is real, and f, and f_; are conjugate 
ecmplex numbers. Setting 


2x 
i= ~ fe edt = (a, + 1,), 
a 


22 See, for example, [12], Chapter II. 


362 IX. LInesr OPeRAtorRS IN A UNITARY SPACE 


awnere 
, 2x , 22 
a= = | f(t) cogktdt, b= — | fl) sinkedt (k =0, 1,2, ...). 
0 0 
we have 


fe™ + f_ye* =a, coskt+b,sinkt (k =1, 2, ...). 


Therefore, for a real function f(t) the Fourier series assumes the form 


2x 
a= {10 cos kt dt , 
0 


+ (a 008 kt + b, sin kt) k=0, 1, 2, ... 
kewl 


27 
b= = ft sin kt dt, 
0 


§ 7. Orthonormal Bases 


1. A basis of any finite-dimensional subspace S in a unitary or a euclidean 
space R is a non-degenerate sequence of vectors and therefore—by Theorem 2 
of the preceding section—can be orthogonalized and normalized. Thus: 
Every finite-dimensional subspace S (and, in particular, the whole space R 
af it is fonite-dimensional) has an orthonormal basis. 


Let e;, €:,..., e, be an orthonormal basis of R. We denote by x, x2, 
X;,..., 2, the coordinates of an arbitrary vector x in this basis : 
n 
x= S'2,e,. 
k=l 


Multiplying both sides of this equation on the right by e, and taking 
into account that the basis is orthonormal, we easily find : 


“2%, =(xe,) (k= 1, 2, vee, ®)5 


i.e., in an orthonormal basis the coordinates of a vector are equal to its pro- 
jections onto the corresponding basis vectors: 


n 
x= Dd’ (xe,) e,. (41) 
k=l 
Let 21, 22, ..., 2, and 24, 23, ..., % be the coordinates of one and the 
Same vector x in two different orthonormal bases e,, e,,...,e, and 
e}, Cg, ---, &, of a unitary space R. The formulas for the coordinate trans- 
formation have the form 


§ 7. OrTHONORMAL BaSsES 263 


B 
= OMe Me (¢=1, 2, ...,n). (42) 
Here the coefficients w,, we,, ..., Uy, that form the k-th column of the 
matrix U= | Ui |? are easily seen to be the coordinates of the vector e, 
in the basis e,;, e€2,...,e,- Therefore, when we write down the condition for 
the basis 2), €9, . . -, e, to be orthonormal in terms of coordinates (see (10) ), 


we obtain the relations 


nL 1, for k=l, 
= Uy, = Op, — 0, for kl. (43) 


A transformation (42) in which the coefficients satisfy the conditions 
(43) is called unitary and the corresponding matrix U is called a unitary 
matriz. Thus: In an n-dimensional unitary space the transition from one 
orthonormal basis to another is effected by a unitary coordinate transfor- 


mation. 
Let R be an n-dimensional euclidean space. The transition from one 
orthonormal basis of R to another is effected by a coordinate transformation 


n 
z= S02", (¢=1, 2,..., ») (44) 
kal 
whose coefficients are connected by the relation 
a 
> Mtn = Fy, (, b= 1, 2,..., 0). (45) 
fel 


Such a coordinate transformation is called orthogonal and the correspond- 
ing matrix V is called an orthogonal matrix. 


2. We note an interesting matrix method of writing the orthogonalizing 


process. Let A= | Qik [i be an arbitrary non-singular matrix (| A | #90) 
with complex elements. We consider a unitary space R with an orthonormal 
basis €;, €z,..., @, and define the linearly independent vectors aj, a2, ... , Gn 


by the equations 
. n 
a, = S* ane, (A=1,2,...,). 


t=i 

Let us perform the orthogonalizing process on the vectors a, a2, ..., An. 
The orthonormal basis of R so obtained we shall denote by uj, uy, ..., Un. 
Suppose we have 


u= DS) ue; (KH 1.2455): 


t=1 


264 IX. LIngeAR OPERATORS IN A Unitary Space 


Then 
[ay, @o,...,@p| = [U, Ue,..., Uy] (p= 1, 2,...,%), 
1.e., . 
, = a4» 
Gy = Cig, + Cootla,s 
Gn = CinWy + Conte, + °° + Cunltns 
where the cy, (4,4=1, %...,n;2k) are certain complex numbers. 


Setting c,,—=0 for + > k, we have: 


a 
ay, = >" Cory (k=1,2,....n). 
pal 
When we go over to coordinates and introduce the upper triangular matrix 
C= | Cik |i and the unitary matrix 7 = || ua | , we obtain 


n 
Ayn = YS” UipCpx (4,4 =1,2,...,%), 
p=ml 
or 


A= UC. (*) 


According to this formula: Every non-singular matric A= ] Aix |\t can 
be represented ia the form of a producti of a unitary matriz U and an upper 
triangular matriz C. 

Since the orthogonalizing process determines the vectors uj, Uo, ..., Un 
uniquely, apart from scalar multipliers ¢, &2,..., €n (| & |=1;7=1, 2,...., 
nm), the factors U and C in (*) are uniquely determined apart from a diagonal 
factor M = {&, EQ, . 0 y En} : 


U=U,M;, C = M-10€,. 
This can also be shown directly. 


Note 1. Jf A is a real matrix, the factors U and C in (*) can be chosen 
to be real. In this case, U ‘is an orthogonal matrix. - 


Note 2. The formula (*) also remains valid for a singular matrix 
A (| A|=0). This can be seen by setting A=lim A, where | A, | 0 
"™=—>Oo . ; 
(m =1, 2,...). 
Then An=UnCm (m=1,2,...). When we select from the sequence 
{U,,} a convergent subsequence {U,,, } ( lim Um, —U) and proceed to the 


limit, then we obtain from the equation A,,,=Un,Cm, for p— o the re- 
quired decomposition A= UC. However, in the case | A | =O the factors 
U and C are no longer uniquely determined to within a diagonal factor M. 


§ 8. Tue Apgorint OPERATOR 265 
Note 3. Instead of (*) we can also obtain a formula 
A=DW, (**) 


where D is a lower triangular matrix and W a unitary matrix. For when 

we apply the formula (*) that was established above to the transposed 

matrix AT | 
A'T=UC 


and then set W=U', D=C", we obtain (**).* 


§ 8. The Adjoint Operator 


1. Let A be a linear operator in an n-dimensional unitary space. 
DEFINITION 4: A linear operator A* is called adjoint to the operator A 
if and only if for any two vectors x, y of R 
(Ax, y)= (x, A*y). (46) 


We shall show that for every linear operator A there exists one and only 
one adjoint operator A*. To prove this, we take an orthonormal basis 
€1, €2,...,@,in R. Then (see (41) ) the required operator A* and an arbi- 
trary vector y of R must satisfy the equation 


Aty sa?) (A*y, e,) e,. 


By (46) this can be rewritten as follows: 
A*y= 2 (y, Ae,) e,. (47) 


We now take (47) as the definition of an operator A’*. 

It is easy to verify that the operator A* so defined is linear and satisfies 
(46) for arbitrary vectors x and y of R. Moreover, (47) determines the 
operator A* uniquely. Thus the existence and uniqueness of the adjoint 
operator A* is established. 

Let A be a linear operator in a unitary space and let A = | Aix ik be the 
corresponding matrix in an orthonormal basis e1, €2,...,@,. Then, by apply- 


na 
ing the formula (41) to the vector Ae, = ~ a, &; , we obtain 
t= 


a, =(Ae,,e,) (¢,k=1,2,..., 2). (48) 


23 From the fact that U is unitary it follows that U™ is unitary, since the condition 
(43), written in matrix form U'U = E, implies that UUT = E. 


266 TX. Linear Operators In A UnrTary Space 


Now let A*= || au ||f be the matrix corresponding to A* in the same 
basis. Then, by (48), 


aj,— (A*e,,e,) (t,k=1,2, ..., 2)- (49) 
From (48) and (49) it follows by (46) that 


= Gi (t, k —], Z. woes n), 


1.€., 


The matrix A* is the complex conjugate of the transpose of A. This matrix 
will be called the adjoint of A. (This is not to be confused with the adjoint 
of a matrix as defined on p. 14.) 


Thus: In an orthonormal basis adjoint matrices correspond to adjoint 
operators. 

The following properties of the adjoint operator follow from its defi- 
nition: 

1, (A*)* = A, 

2.(4 + B)*= A*+ B*, 

3.(aA)*= aA* (a a sealar), 


4.(AB)* = B*A*. 


2. We shall now introduce an important concept. Let S be an arbitrary 
subspace of R. We denote by T the set of all vectors y of R that are orthogo- 
nal to S. It is easy to see that T is a subspace of R and that every vector x 
of R can be represented uniquely in the form of a sum x=2,+ x,, where 
x,¢eS,x,¢T, so that we have the resolution 


R=S+T, S..T. 


We obtain this resolution by applying the decomposition (15) to the 
arbitrary vector x of R. T is called the orthogonal complement of S. Obvi- 
ously, S is the orthogonal complement of T. We write S | T, meaning by this 
that each vector of S is orthogonal to every vector ‘of T. 

Now we can form»)ate the fundamental property of the adjoint operator: 


5. If a suodspace S 1s invariant unth respect to A, then the orthogonal 
complement T of the subspace is invariant with respect to A*. 


§ 8. THE ADJOINT OPERATOR 267 


For let x<S, ye T. Then it follows from Axe S that (Ax, y) =0 and 
hence by (46) that (x, A*y) =0. Since x isan arbitrary vector of S, A*y« T, 
and this is what we had to prove. 


We introduce the following definition : 


- DEFINITION 5: Two systems of vectors x1, X2,..., %m aNd Yi, Yo, -- +5 Vm 
are called bi-orthogonal if 


(x ¥,)=—O, (1,4=1,2,..., m), (50) 


where by, ts the Kronecker symbol. 

Now we shall prove the following proposition : 

6. If Ais a linear operator of simple structure, then the adjoint operator 
A* is also of simple structure, and complete systems of characteristic vectors 
1, X2,..., Xn ONd ¥1, Yo, ..., ¥n Of A and A* can be chosen such that they 
are bi-orthogonal : 


Ax,=A,x, A*y,=Wy, (%%,)= O84 (4,4 =1,2,..., 0). 


For let x1, xo, ..., X, be a complete system of characteristic vectors of A. 
We use the notation 


Si =[%1, .--) Spy Bppys +++ %q) (=1,2,..., 2). 


Consider the one-dimensional orthogonal complement T;,=[y,] to the 
(m —1)-dimensional subspace S, (kK=1,2,...,”). Then T;, is invariant 
with respect to A*: 


A*Y,=[yYp Veo (k=1,2,..., 0). 


From 8S; 1, it follows that (xzy.) 540, because otherwise the vector yx, 
would have to be the null vector. Multiplying 2;, y, (k=1, 2,...,) by 
suitable numerical factors we obtain 


(%;5,) = Su (1, k =1, 2, eee » n). 


From the bi-orthogonality of the systems x;, %2,..., %, and y;, Y2,..., Yn it 
follows that the vectors of each system are linearly independent. 
We mention one further proposition : 


7. If the operators A and A* have a common characteristic vector, then 
the corresponding characteristic values are complex conjugates. 

For let Ax =Ax and A*x=px'(xo). Then, setting y= x in (46), 
we have A(x, x)= p(x, x) and hence 4=p. 


268 EX. LIngEAR OprerRators IN A UNITARY SPACE 


§ 9. Normal Operators in a Unitary Space 


1. DEFINITION 6. A linear operator A is called normal if it commutes with 
its adjoint : 
AA* = A*A. (51) 


DEFINITION 7. A linear operator H 1s called hermitian if it is equal to 
ats adjoint : 
H* =H. (52) 


DeFINITION 8. A linear operator U is called unitary if tt is inverse to its 
adjoint : 
UU*=E (53) 


Note that a unitary operator can be regarded as an isometric operator 
in a her.nitian space, i.e., as an operator preserving the metric. 
For suppose that for arbitrary vectors x and y of R 


(Ux, Uy) =(x, y). (54) 
Then by (46) 
(U*Ux, y) = (x, 9) 


and therefore, since y is arbitrary, 


U*Ux= x, 


1e., U*U =E, or U* =U". Conversely, (53) implies (54). 

From (53) and (54) it follows that 1. the product of two unitary opera- 
tors is itself a unitary operator, 2. the unit operator E is unitary, and 3. the 
inverse of a unitary operator is also unitary. Therefore the set of all unitary 
operators 1s a group.” This is called the unitary group. 

Hermitian operators and unitary operators are special cases of a normal 
operator. 


2. We have 
THEOREM 3: Every linear operator A can be. represented in the form 
A=H, + #H,, (55) 


where H, and Hz are hermitian operators ( the‘ hermitian components’ of A). 
The hermitian components are uniquely determined by A. The operator A 
os normal if and only if its hermatian components H, and Hz are permutabdle. 


24 See footnote 13 on p. 18. 


§ 9. NorMau Operators IN A UNITARY SPACE 269 


- Proof. Suppose that (55) holds. Then 
A* =H, —ih,. (56) 
From (55) and (56) we have: 


Vv 


H,=1(4+44*), H,=z(A—AM), (57) 


Conversely, the formulas (57) define hermitian operators H, and H2 con- 
nected with A by (55). 

Now let A be a normal operator: 4A*= A*A. Then it follows from 
(57) that H,H, =H,H,. Conversely, from H,H,= H,H, it follows by (55) 
and (56) that 44 - A*A. This completes the proof. 

The representa.ion of an arbitrary linear operator A in the form (55) 
is an analogue to the representation of a complex number z in the form 
Ly + 12, where z- | nd zo are real. 

Suppose that in some orthonormal basis the operators A, H, and U cor- 
respond to the matrices A, H, and U. Then the operator equations 


AA*=A*A, H*=H, UU*=£E (58) 
correspond to the matrix equations 
AA*=A*A, H*=H, UU*=E. is _ . (59) 


Therefore we define a matrix as normal if it commutes with its adjoint, as 
hermitian if it is equal to its adjoint, and finaliy as unitary if it is inverse 
to its adjoint. 

Then: In an orthonormal basis a normal (hermitran, unitary) operator 
corresponds to a normal (hermitian, unitary) matriz. 


A hermitian matrix H = || hu | is, by (59), characterized by the fol- 
lowing relation among its elements: 


hy=hy (4,4 =1,2,..., 0); 


1.e., a hermitian matrix is always the coefficient matrix of some hermitian 
form (see § 1). 

A unitary matrix U = || wa Ki is, by (59), characterized by the follow- 
ing relations among its elements: 


Dy Uji = Sy, (4, =1,2, 06, M) (60) 
fui 


270 LX. Linear Operators IN A UnNrTakyY SPACE 


Since UU* = E implies that U*U = E, from (60) there follow the equiva- 
lent relations : 


Dy Uli = Sy (4,4 =1,2,..., 0). (61) 
j=l 


Equation (60) expresses the ‘orthonormality’ of the rows and equation (61) 
that of the columns of the matrix U = || wx 1k Ab 


A unitary matrix is the coefficient matrix of some unitary transforma- 
tion (see § 7). 


§ 10. The Spectra of Normal, Hermitian, and Unitary Operators 


1. As a preliminary, we establish a property ot permutable operators in 
the form of a lemma. 


Lemma 1: Permutable operators A and B (AB=BA) always have a: 
common characteristic vector. 

Proof. Yuet x be a characteristic vector of A: Ax =Ax, x30. Then, 
since A and B are permutable, 


AB‘*x=A1B x (k=0, 1,2, ...). (62) 
Suppose that in the sequence of vectors 
x, Bx, Bx, ... 


the first p are linearly indeperident, while the (p +1)-th vector Bx is a 
linear combination of the preceding ones. Then S$ = {[x, Bz, ..., B?-'x} 1s 
a subspace invariant with respect to B, so that in this subspace S there 
exists a characteristic vector y of B: By=yuy, yo. On the other hand, 
(62) shows that the vectors «, Bx, ..., B?-'x are characteristic vectors of A 
corresponding to one and the same characteristic value 4. Therefore every 
linear combination-of these vectors, and in particular y, is a characteristic 
vector of A corresponding to 4. Thus we have proved the existence of a 
eommon characteristic vector of the operators A and B 

_Let A be an arbitrary normal operator in an n-dimensional hermitian 
space R. In that case A and A* are permutable a. ! therefore have a common 
characteristic vector x,. Then (see § 8, 7.) 


25 Thus, orthonormality. of the colu ns of the matrix U is a consequence of the 
orthonormality of the rows, -and vice versa. 


§ 10. Spectra or NorMAL, HERMITIAN, AND UNITARY OPERATORS 271 


Ax, =A,%,, Atx,= 4,2, (%,54 0), 
We denote by S; the one-dimensional subspace containing the vector 
x, (S; = [x1:]) and by T, the orthogonal complement of S; in R: 


R=S8,+T1, 8,173. 


Since S; is invariant with respect to A and A*, T, is also invariant with 
respect to these operators (see § 8, 5.). Therefore, by Lemma 1, the per- 
mutable operators A and A* have a common characteristic vector x2 in T,: 


Ax, =ApXq, A*X,=Ap%,  (%_X 0). 
Obviously, x; 1%. Setting So= [x1, x2] and 
R=S.+T>. 82.1 To, 


we establish in a similar way the existence of a common characteristic vector 
x,of Aand A*inT,;. Obviously #1; | x; and x2. | x;. Continuing this process, 
we obtain n pairwise orthogonal common characteristic vectors %;, X2,..., 2% 


-of A and A*: 


Ax,=4A,%,, A*x,=1,%, (%,%0), 


(x,x,)= 0, for ixk (4, k=1, re nN). (63) 


The vectors x1, %2,..., %, can be normalized without violating (63). 

Thus we have proved that a normal operator always has a complete 
orthonormal system of characteristic vectors.” 

Since 4, =A, always implies that 4, = 4), it follows from (63) that: 

1. If A is a normal operator, every characteristic vector of A 1s a char 
acteristic vector of the adjoint operator A®*, i.e., if A is a normal operator 
then A and A* have the same characteristic vectors. 

Suppose now, conversely, that a linear operator A has a complete ortho- 
normal system of characteristic vectors: 

Ax, =A,%,, (%,%,) = Sy CHE ae pe Paes) ee 

We shall show that A is then a normal operator. For let us set: 

¥, = A*x,—1,%,. 
Then 
(x,¥,) ='*,, A*x,)—A, (x,%,) = (Ax,, ,) —A, (%,%,) 
- (A, — 4,) Oy =0 (k, i=1, 2, eoey n). 
Hence it follows that 


26 Here, and in what follows, we mean by a complete orthonormal system of vectors 
an orthonormal system of-” vectors, where 7 is the dimension of the space. 


972 TX. Linear Operators In a Unrrary SPACE 
y,=A*x,—1x,=0 (1=1, 2, ..., 2), 


-e., that (63) holds. 
But then 


AA*x, =1,A,%, and A*Ax,=J,x%, (k=1, 2,..., n), 


or 

Ant=A*A. 
Thus we have obtained the following ‘internal’ (spectrai) characterization 
of a normal operator A (apart from the ‘external’ one: 4A* = A*A) : 


THEOREM 4: A linear operator is normal if and only if 1t has a complete 
orthonormal system of characteristic values. 
In particular, we have shown that a normal operator is always of simple 


structure. . 
Let A be a normal operator with the characteristic values 41, do, ..., An. 


Using the Lagrange interpolation formula, we define two polynomials p(A) 
and q(A) by the conditions 


p(A,)=4,, 9(A) =A, (R= 2,..., 0). 
Then by (63) 
A*=p(A), A=q(A*); (64) 
Le.¢ 

2. If Aisa normal operator, then each of the operators A and A* can 
be represented as a polynomial in the other; these two polynomials are 
determined by the characteristic values of A. 

Let S be an invariant subspace of R for a normal operator A and let 
R=S+T,S.T. Then by § 8, 5., the subspace T is invariant with respect 
to A* But 4=q(A*), where q(A) is a polynomial. Therefore T is also 
invariant with respect to A. Thus: 

3. If Sis an invariant subspace with respect to a normal operator A and 
T is the orthogonal complement of S, then T 1s also an invariant subspace 
for A. 

2. Let us now discuss the spectrum of a hermitian operator. Since a her- 
mitian operator H is a special form of a normal operator, by what we have 
proved it has a complete orthonormal system of characteristic vectors: 


Hx, =A,x,, (*,%,)=6, (k, I= 1, 2, ..., n). (65) 
From H* = HZ it follows that 
A=A, (k=1, 2, ..., n), (66) 


§ 10. Spectra or NorMAL, HERMITIAN, AND Unitary OperaTorS 273 


i.e., all the characteristic values of a hermitian operator H are real. 
It is not difficult to see that, conversely, a normal operator with real 
characteristic values is always hermitian. For from (65), (66), and 


H*x,=A,%, (kK=1, 2, ..., 2) 
it follows that a 
; H*x,= Hx, (kK=1, 2, ..., n), 
ie. 
H*=8. 


We have obtained the following ‘internal’ characterization of a hermitian 
operator (apart from the ‘external’ one: H* =H): 


THEOREM 5: A linear operator H is hermitian if and only if at has a 
complete orthonormal system of characteristic vectors unith real character- 
astic values. . 

Let us now discuss the spectrum of a unitary operator. Since a unitary 
operator U is normal, it has a complete orthonormal system of characteristic 
vectors 

Ux, =A,%,, (*,*,) = 4,, (h, FH) 2) 22.05 *)y (67) 
where 
Utx,=jx, (k=1, 2, ..., 7). (68) 


From UU* =E we find: 


_ 


AA, =1. (69) 


Conversely, from (67), (68), and (69) it follows that UU*=—E. Thus, 
among the normal operators a unitary operator is distinguished by the fact 
that all its characteristic values have modulus 1. 

We have thus obtained the following ‘internal’ characterization of a 
unitary operator. (apart from the ‘external’ one: UU*=E): 


THEOREM 6: A linear operator 1s umtary wf and only tf tt has a complete 
orthonormal system of characteristic vectors with characteristic values of 
modulus 1. 

Since in an orthonormal basis a normal (hermitian, unitary) matrix 
corresponds to a normal (hermitian, unitary) operator, we obtain the fol- 
lowing propositions: 

THEOREM 4’: A matrix A ts normal tf and only tf tt ts unttarily similar 
to a diagonal matriz: 


A=U|\A6a\t0~> (U*=U73). (70) 


274 IX. LINEAR OPERATORS IN A UNITARY SPACE 


THrorEM 5’: A matrix H is hermitian tf and only tf rt 1s untarily 
similar to a diagonal matriz with real diagonal elements : 


H=U|4og|{U° (U* =U"; a4,=45 $= 1,2, ..., 2). (7) 


THEOREM 6’: A matrix U is unitary if and only tf it ts unitarily similar 
to a diagonal matriz with diagonal -lements of modulus 1: 


U=U,|\ Ade lh0z (U; = UG: [A;] =1; +=1, 2, ..., ). (72) 


§ 11. Positive-Semidefinite and Positive-Definite Hermitian Operators 


1. We introduce the following definition: 


DEFINITION 9: A hermitian operator H is called positive semidefinite if 
for every vector x of R 


(Hx, x)20, 


and positive definite if for every vector x0 of R 


(Hx, x)>0. 
If a vector x is given by its coordinates 2), 22, ..., 2, in an arbitrary 
orthonormal basis, then (Hx, x), as is easy to see, is a hermitian form in the 
variables 71, Z2, ..., Zn; and to a positive-semidefinite (positive-definite) 


operator there corresponds a positive-semidefinite (positive-definite) hermi- 
tian form (see §1). 

We choose an orthonormal basis x1, Xo, ..., Xn of characteristic vectors 
of H: 


Hx,=41,%,, (%,%,) = 4, (k, 7=1, 2, ..., ), (73) 


Then, setting x=)’ £,.x,, we have 
kel 


(He, #)=S'4|& |" (b=1, 2 0). 


Hence we easily deduce the ‘internal’ characterizations of positive-semi- 
definite and positive-definite operators: 


THEOREM 7: A hermitian operator 1s positive semidefinite (positive 
defimte) 1f and only rf all its characteristic values are non-negative (post- 
tive). 


§ 11. PosItTIVE-SEMIDEFINITE & PosiTIVE-DEFINITE HERMITIAN OPERATORS 275 


From what we have shown, it follows that a positive-definite hermitian 
operator is non-singular and positive semidefinite. 

Let H be a positive-semidefinite hermitian operator. The equation (73) 
holds for H with 4, =0 (k=1,2,...,n). Weset o0,=YVa, = 0 (k=1, 2, 
3, ..., 2) and define a linear operator F by the equation 


Fx,= 0,%, (E= 1,2; sa0¢ 2). (74) 
Then F is also a positive-semidefinite operator and 
F?—H. (75) 


We shall call the positive-semidefinite hermitian operator F connected with 
H by (75) the arithmetical square root of H and shall denote it by 


F=yjH. 


If H is positive definite, then F is also positive definite. 
We define the Lagrange interpolation polynomial g(4) by the equations 


9 (A) =0,(=Va) (k=1,2,...,2) (76) 
Then from (73), (74), and (76) it follows that: 
F=9(f). (77) 


The latter equation shows that VH is a polynomial in H and is uniquely 
determined when the positive-semidefinite hermitian operator H is given 
(the coefficients of g(A) depend on the characteristic values of H). 


2 Examples of positive-semidefinite hermitian operators are AA* and 
A*A, where A is an arbitrary linear operator in the given space. Indeed, 
for an arbitrary vector x, 


(AA*x, x) =(A*tx, A*x)>0, 
(A* Ax, x) =(Ax, Ax)>0. 


If A is non-singular, then AA* and A*A are positive-definite hermitian 
Operators. 

The operators AA* and A*A are sometimes called the left norm and right 
norm of A. YAA* and jA*A are called the left modulus and right medulus 
of A. 

For a normal operator the left and right norms, and hence the left and 
right moduli, are equal.’ 


27 For a detailed study of normal] operators, see [168]. In this paper necessary and 
sufficient conditions for the product of two normal operators to be normal are established. 


276 IX. LInecar Orerators IN A Unitary SPACE 


§ 12. Polar Decomposition of a Linear Operator in a Unitary Space. 
Cayley’s Formulas 


1. We shall prove the following theorem :”® 


THEOREM 8: Every linear operator A ina unitary space can be repre- 
sented in the forms 

A=HU, (78) 

A= U,H,, (79) 


where H, H, are positive-semidefinite hermitian operators and U, U;, are 
unitary operators. Ats normal if and only if in (78) (or (79)) the factors 
H and U (H, and U,) are permutable. 

Proof. From (78) and (79) it follows that H and My, are the left and 
right moduli, respectively, of A. 

For 

AA* = HUU*H =H, A*A =H,UjU,H,= H?. 

Note that it is sufficient to establish (78), since by applying this decom- 

position to A* we obtain A* = HU and hence 


A=U"H. 


i.e., the decomposition (79) for A. 
We begin by establishing (78) in the special case where A is non-singular. 
(| 4|340). We set: 


H= 7 AA* (here |H?=|A)?*~0), U=H'4 
and verify that U is unitary: 
UU* =H AA*H'= H1H’?H1=E. 
Note that in this case not only the first factor H in (78), but also the 


second factor U is uniquely determined by the non-singular operator A. 


We now consider the general case where A may be singular. 

First of all we observe that a complete orthonormal system of charac- 
teristic vectors of the right norm of A is always transformed by A into an 
orthogonal system of vectors. For let 


A* Ax, = Qi%, [(%,%,) =z, 0,29; kL =1, 2, ..,, n). 
Then 
(Ax,, Ax,) = (A*Ax,, x)= 0)-(%,x%,)=0 (kA). 


28 See [168], p. 77. 


§ 12. Pouar Decomposition IN a Unitary Space. CayYLey’s Formuias 277 


Here 

| Ax, |? =(Ax,, Ax,)= 0} (K=1,2,..., 0). 
Therefore there exists an orthonormal system of vectors 2, 3o,..., 3, such 
that 


Ax,=0,%, [(2,%,)=6,,; 4,4 =1,2,...,n}]. (80) 
We define linear operators H and U by the equations 
Ux,=2,, Hz,=0,%,. 
From (80) and (81) we find: 
A= HU. 


Here H is, by (81), a positive-semidefinite hermitian operator, because it 
has a complete orthonormal system of characteristic vectors 3, 32, ..., % With 
non-negative characteristic values @,, 0,,-.-,0,; and U is a unitary operator, 
because it carries the orthonormal system of vectors 21, Xo, ..., x, into the 
orthonormal system 4), 42, ..., Zn. 

Thus we can take it as proved that an arbitrary linear operator A has 
decompositions (78) and (79), that the hermitian factors H and HM; are 
always uniquely determined by x (they are the lefKand right moduli of A, 
respectively) and that the unitary factors U and U, are uniquely determined 
only when 4 is non-singular. 

From (78) we find easily: 


AA*=H?, A*A=U-U. (82) 
If A is a normal operator (4A* = A*A), then it follows from (82) that 
H?U= UH?. (83) 


Si-.ce H = yH?= 9(H?) (see § 11), (83) shows that U and H commute. 
C- versely, if H and U commute, then it follows from (82) that A is normal. 
This completes the proof of the theorem.”® 


28 Tf the characteristic values Ax, Az,..., An ANd 0, 0,,---,Qn Of the linear operator A 
and its left modulus H=)AA* (by (82) Oy Qp,---»Qn are also the characteristic values 
of the right modulus H, —)yA*A ) are so numbered that 


fa, | ]] 4s | Sere Sl Anl> 012022 °°° S On- 
then (see [379], or [153] and [296]) the following inequality of }7eyl holds: 


[AalsSe, [ajtlAjSerte» --» [alters +]aalSearts- t+ en- 


278 ? IX. Linear Operators IN A UNiTary SPACE 


It is‘hardly necessary to mention that together with the operator equa- 
tions (78) and (79) the corresponding matrix equations hold. 

The decompositions (78) and (79) are analogues to the representation 
of a complex number z in the form z= ru, where r =| z| and | u|=1. 


2. Now let x1, X%2,..., X, be a complete orthonormal system of characteristic 
vectors of the arbitrary unitary operator U. Then 


Ux,=elex,, (%,%,)=6,, (kh, l=1,2,...,%). (84) 
where the f, (k=], 2,..., ”) are real numbers. We define a hermitian 
operator F by the equations 

Fx,=f,%, (k=1,2,...,2). (85) 


From (84) and (85) it follows that :*° 
U=ef, (86) 


Thus, a unitary operator U is always representable in the form (86), where 
F is a hermitian operator. Conversely, if F is a hermitian operator, then 
U= e¥ is unitary. 

The decompositions (78) and (79) together with (86) give the follow- 
ing equations: 


A= He, (87) 
A=e"H, (88) 


where H, F, H,, and F, are hermitian operators, with H and H, positive semi- 
definite. 

The decompositions (87) and (88) are analogues to the representation 
of a complex number z in the form z=re'?, where r= 0 and are real 
numbers. 

Note. In (86), the operator F is not uniquely determined by U. For F 
is defined by means of the numbers f, (k =1, 2,..., ») and we can add to 
each of these numbers an arbitrary multiple of 2x without changing the 
original equations (84). By choosing these multiples of 2” suitably we can 
assume that et = ot always implies that f, =f; (l= k,l n). Then we 
ean determine the interpolation polynomial g(4) by the equations 


g(ex)=f, (kK=1,2,...,%). (89) 


ae 


30 efF = r(F), where r(A) is the Lagrange interpolation polynomial for the function 
et4 at the places Sry fare eey fa. 


§ 12. Ponar DECOMPOSITION IN A Unitary Space. CAYLEY’s ForMuuas 279 
From (84), (85), and (89) it followspthat 
F=g9(U)=g(e"). (90) 
Similarly we can normalize the choice of F,; so that 
P,=h(U,) =h (es) , (91) 


where A(A) is a polynomial. 

By (90) and (91), the permutability of H and U (H, and U;) implies 
that of H and F (HM, and F,), and vice versa. Therefore, by Theorem 8, 
A is normal if and only if in (87) H and F (or, in (88), MH, and F,) are 
permutable, provided the characteristic values of F (or F,) are suitably 
normalized. 

The formula (86) is based on the fact that the functional dependence 


pel (92) 


carries n arbitrary numbers f/f, fo,..., f, on the real axis into certain num- 
bers #,, 43, --., , on the unit circle | «|= 1, and vice versa. 

The transcendental dependence (92) can be replaced by the rational 
dependence 


1+3 
pais, (93) 
which carries the real axis f =f into the circle | «| =1; here the point at 
infinity on the real axis goes over into the point »=—1. From (93), we 
find : 
—,l—4# 


Repeating the arguments which Have led us to the formula (86), we 
obtain from (93) and (94) the pair of inverse formulas: 
U =(E + iF) (E —tF)", (95) 
F=i(E—U) (E+ U0), 


We have thus obtained Cayley’s formulas. These formulas establish a 
one-to-one correspondence between arbitrary hermitian operators F and 
those unitary operators U that do not have the characteristic value — 1.** 


31 The exceptional value — 1 can be replaced by any number ft (| |) =1). For this 
purpose, we have to take instead of (93) a fractional-linear function mapping the real 


axis f= f onto the circle |w|—1 and earrying the point f= oo into p= po. The 
formulas (94) and (95) can be modified correspondingly. 


280 IX. Linear OPERATORS IN A UNITARY SPACE 


The formulas (86), (87), (88), and (95) are obviously valid when we 
replace all the operators by the corresponding matrices. 


§ 13.: Linear Operators in a Euclidean Space 
1. We consider an n-dimensional euclidean ‘space R. Let A be a linear 
operator in R. 


DEFINITION 10: The linear operator A" is called the transposed operator 
of A (or the transpose of A) if for any two vectors x and y of R: 


(Ax, y)=(x, ATy). (96) 


The existence and uniqueness of the transposed operator is established 
in exactly the same way as was done in § 8 for the adjoint operator in a 
unitary space. 

The transposed operator has the following properties: 

i; (A')' =A, 

2.(4+ B)'=A‘T+ B’, 

3. (aA) =aA"™ (aa real number), 

4. (AB)=B'A'. 


We introduce a number of definitions. 


DEFINITION 11: A linear operator A 1s called normal tf 
AA‘ =A'‘A. 
DEFINITION 12: A linear operator S 1s called symmetric if 
ST=S. 


DEFINITION 13: A symmetric operator S 1s called positive semidefinite 
if for every vector x of R 
(Sx,x) = 0. 


DEFINITION 14: A symmetric operator § is called positive defimte tf 
for every veccor xo of R 


(Sx,x) > 0. 


DEFINITION 15: A linear cperator K is called skew-symmetric tf 


'=—K. 


§ 13. LINEAR OPERATORS IN A EUCLIDEAN SPACE 281 


An arbitrary linear operator A can always be represented uniquely in 
the form 
A=S+K, ' .(97) 


where S$ is symmetric and K is skew-symmetric. 
For it follows from (97) that 


A'=S— K. (98) 
From (97) and (98) we have: 


S=} (444), K=5(A—A’). (99) 


Conversely, (99) defines a symmetric operator § and a skew-symmetric 
operator K for which (97) holds. 

S and K are called respectively the symmetric component and the skew- 
symmetric component of A. 


DEFINITION 16: An operator Q is called orthogonal if 1t preserves the 
metric of the space, 1.e., if for any two vectors x, y of R 


(Qx, Oy) = (%,¥). (100) 
. By (96), equation (100) ean be written as: (x, QO'Oy) =(x, y). Hence 
O'Q=E. (101) 


Conversely, (101) implies.(100) (for arbitrary vectors x,y).*7 From 
(101) it follows that: | Q |?=1, i.e., 


|\@j/==+1. 
We shall call @ an orthogonal operator of the first kind (or proper) if 
| @ | =1 and of the second kind (or improper) if | Q|=— 1. 


Symmetric, skew-symmetric, and orthogonal operators are special forms 
of a normal operator. 

We consider an arbitrary orthonormal basis in the given euclidean space. 
Suppose that in this basis A corresponds to the matrix A= ] Oak IK (here 
all the ay, are real numbers). The reader will have no difficulty in showing 
that the transposed operator A’ corresponds in this basis to the transposed 
matrix A* =|| aj, ||", where aj, =a, (1,k=1, 2,...,). Hence it follows 
that in an orthonorma: basis a normal operator A corresponds to a normal 


32 The orthogonal operators in a euclidean space form a group, the so-called orthogonal 
group. 


282 1X. LINEAR OPERATORS IN A UNITARY SPACE 


matrix A (AAT= AA), a symmetric operator S to a symmetric matrix 
S= || sa ||} (S™T=S), a skew-symmetric operator K to a skew-symmetric 
matrix K = | ki || (K* =— K) and, finally, an orthogonal operator Q to 
an orthégonal matrix Q = || qu ||t (QQ" = £).** 

Just as was done in § 8 for the adjoint operator, we can here make the 
following statement for the transposed operator : 

If a subspace S of R 1s invariant with respect to a linear operator A, 
then the orthogonal complement T of Sin R ts invariant with respect to A. 
2. For the study of linear operators in a euclidean space R, we extend R 
to a unitary space R. This extension is made in the following way: 

1. The vectors of R are called ‘real’ vectors. 

2. We introduce ‘complex’ vectors s=x + ty, where x and y are real, 
Le. eR, ye R. : 

3. The operations of addition of complex vectors and of multiplication 
by a complex number are defined in the natural way. Then the set of all 
complex vectors forms an n-dimensional vector space R over the field of 
complex numbers which contains R as a subspace. 

4. In R we introduce a hermitian metric such that in R it coincides 
with the existing euclidean metric. The reader can easily verify that the 
required hermitian metric is given in the following way: 

If s=2+ty, w= u + iv (x, y,u,veR), then 


(300) = (xu) + (yv) +4[(yu) —(xv)]. 
Setting 2 = x—+y and w =u — iv, we have 
(3 w) = (a0). 
If we choose a real basis, i.e., a basis of R, then R will be the set of all vectors 
with complex coordinates and R the set of all vectors with real coordinates 


in this basis. _ 
Every linear operator A in R extends uniquely to a linear operator in R: 


A(x+ ty) = Ax + 1Ay. 


c 
3. Among all the linear operators of R those that are obtainable as the 
result of such an extension of operators of R can be characterized by the 
fact that they carry R into R (ARCR). These operators are called real. 


33 The papers [138}, [262a], [170b] are devoted to the study of the structure of 
orthogonal matrices. Orthogonal matrices, like orthogonal operators, are called proper 
and improper according as | Q|]=—+1 or |Q@|=—1. 


§ 13. Linear OPERATORS IN A HUCLIDEAN SPACE 283 


In a real basis real operators are determined by real matrices, i.e., mat- 
rices with real elements. 

A real operator A carries conjugate complex vectors s=x + iy, 
z=x—iy (x, y¢R) into conjugate complex vectors: 


Az=Ax+tAy, Az=Ax—iAy (Ax, Ay «®) 


The secular equation of a real operator has real coefficients, so that when 
it has a root A of multiplicity p it also has the root 1 with the multiplicity p. 
From Az = ds it follows that 43 = 43, i.e., to conjugate characteristic values 
there correspond conjugate characteristic vectors. 
~The two-dimensional space [z, 2] has a real basis: 


1 ees I aes 


We shall call the plane in R spanned by this basis an invariant plane of A 


corresponding to the pair of characteristic values A, A. 
Let 4=n+%. Then it is easy to see that 


Ax = px— vy, 
Ay = ve + by. 


We consider a real operator A of simple structure with the characteristic 
values : 


Agy_1 = My + 1%, Ag, =U, —,, A, = uy, (k=, 2, ...,q; b= 2q41,..., n), 


where py, vx, ff, are real and vy, 0 (k=1, 2,...,q). 
Then the characteristic vectors 2), 22, ... , , corresponding to these char- 
acteristic values can be chosen such that 


Boy y= Xt typ, By %y—ty,, 2,=-*, (102) 
(4k=1,2,...,q; l=2q+4+1,..., n). 
The vectors 
H 1) Ny Ka, Nos oor Bar Ngo Fogy ys + +> Bp (103) 


form a.basis of the euclidean space R. Here 


34 Tf to the characteristic value A of the real operator A there correspond the linearly 
independent characteristic vectors #1, 22, ..., Zp, then to the characteristic value A there 
corregpond the linearly independent characteristic vectors :, 32, ..., Zp. 


284 1X. Linear OPERATORS IN A UNiTARY SPACE 


Ax,= LyX UY E> k=1, 2, coer 

Ay, =, X, a HEX we ( ) 
— ise 

Ax, ca [yk 2¢+ I, 7” 


(104) 


In the basis (103) there corresponds to the operator A the real quasi- 
diagonal matrix 


{ by v, | 


|; ty |I’ va 


Thus: For every operator A of sumple structure in a euclidean space there 
exists a basis in which A corresponds to a matrix of the form (105). Hence 
it follows that: A real matriz of sumple structure is real-similar to a canonical 
matrix of the form (105): 


a=ny 


The transposed operator A™ of Ain R upon extension becomes the adjoint 
operator A* of A in R. Therefore: Normal, symmetric, skew-symmetric, 
and orthogonal operators in R after the extension become normal, hermitian, 
hermitian multiplied by i, and unitary real operators in R. 

It is easy to show that for a normal operator A in a euclidean space a 
canonical basis can be chosen as an orthonormal basis (103) for which (104) 
holds.*> Therefore a real normal matrix is always real-similar and 
orthogonally-similar to a matrix of the form (105) : 


My % 
—% Me 


i ene e Ma} (105) 


\ 


My, 


; sneak mp rr =D (106) 
aia en i baal 


oe eas 
—V%q Mg 


= fy 4 Hy ¥ 
A= y ee ey a4 ) 9 © ¢ 09 ” 1 107 
a} eed 2 Hy || Hogt1 bt. e- (107) 
(Q=Q'71=Q). 


All the characteristic values of a symmetric operator S in a euclidean 
space are real, since after the extension the operator becomes hermitian. 
For a symmetric operator S we must set g=0 in (104). Then we obtain: 


Sx,= ux, [(%,%,) = 6,,; &, b= 1, 2, ..., 0]. (108) 


A symmeiric operator S in a euclidean space always has an othonormal 
system of characteristic vectors wih real characteristic values.°° Therefore: 


35 The orthonormality of the basis (102) in the hermitian metric implies the orthonor- 
mality of the basis (103) in the corresponding euclidean metric. 

36 The symmetric operator S is positive semidefinite if in (108) all 4;2 0 and positive 
definite if all yz> 0. 


§ 13. Lovear Operators IN‘A EUCLIDEAN SPACE 285 


A real symmetric matriz is always real-similar and orthogonally-similar to a 
diagonal matriz : 
S=Q (iy bg +. MQ? (Q=Q1=Q). (109) 


All the characteristic values of a skew-symmetric operator K in a4 
euclidean space are pure imaginary (after the extension the operator is 1 
times a hermitian operator). For a skew-symmetric operator we must set 


im (104) ; 
by = Ma = 89 = My = Mogg = °° =", = 90 
then the formulas assume the form 


Kx,=—%, 9, 
Ky,= »,x, (k=1,2,...,q; 1—2q¢+4+1,..., 2). (110) 
Kx,=o0 


Since K is a normal operator, the basis (103) can be assumed to be 
orthonormal. Thus: Every real skew-symmetric matrix ts real-similar and 
orthogonally-similar to a canonical skew-symmetric matriz : 


call 2 


v, 0 


»0,...,0 Q" (Q=Q"= Q). (111) 


| 0» 
asad —y, 0 


All the characteristic values of an orthogonal operator Q in a euclidean 
space are of modulus 1 (upon extension the operator becomes ynitary). 
Therefore in the case of an orthogonal operator we must set in (104): 


Meth =1, gH +1 (k=1,2,...,q;=2q¢+1,...,n). 


For this basis (103) can be assumed to be orthonormal. The formulas (104) 
can be represented in the form 


Qx, =, cos 9, — ¥, sin Y,, 
Oy, = *, sin Y, + ¥;, cos Y,, ( 
Qx,= +2, 
From what we have shown, it follows that: Every real orthogonal matrix 
as real-similar and orthogonally-sumlar to @ canonical orthogonal matrix: 


b=1: > ne IP 
as | 6 


cos gy, sing, 


cos P, sin gy, 
—sing, cos 9g, || 


—sin gy, cor g, | 


(Q; =< QF = Q; ) 


o= 014 


eee + ilon (113) 


286 IX. Linear OPERATORS IN A UNITARY SPACE 


Example. We consider an arbitrary finite rotation around the point 
O in a three-dimensional space. It carries a directed segment OA into a 
directed segment OB and can therefore be regarded as an operator Q in a 
three-dimensional vector space (formed by all possible segments OA ). This 
operator is linear and orthogonal. Its determinant is + 1, since Q does not 
change the orientation of the space. 
Thus, Q is a proper orthogonal operator. For this operator the formulas 
(112) look as follows: 
Qx, = x, 00s p — ¥, SIN Y, 
Qy1 = %, sin p + ¥1 Cos, 
Qx, = + xg. 
From the equation | @{=1 it follows that Qxg=-x2. This means that 
all the points on the line through O in the direction of x2 remain fixed. Thus 
we have obtained the Theorem of Euler-D’Alembert : 


Every finite rotation of a rigid body around a fixed point can be obtained 
as a finite rotation by an angle m around some fixed axis passing through 
that point. 


§ 14. Polar Decomposition of an Operator and the Cayley Formulas 
in a Euclidean Space 


1. In $12 we established the polar decomposition of a linear operator in 
a unitary space. In exactly the same way we obtain the polar decomposition 
of a linear operator in a euclidean space. 


THEOREM 9. Every linear operator A is representable in the form of a 
product ** 

A=SQ (114) 
where S, S, are positive-semidefinite symmetric and Q, Q: are orthogonal 
operators; here S=y AA' =g(AA"), S,=)YA'A=h(A"A), where g(A) and 
h(A) are real polynomials. 

A is anormal operator if and only if Sand Q (S, and Q:) are permutabdle.. 
Similar statements hold for matrices. 


37 As in Theorem 8, the operators S and S; are uniquely determined by A. If A is 
non-singular, then the orthogonal factors Q and Q, are also uniquely determined. 


§ 14. Pouar DECOMPOSITION IN A EUCLIDEAN SPACE. CAYLEY’s ForMULAS 287 


Let us point out the geometrical content of the formulas (114) and (115). 
We let the vectors of an n-dimensional euclidean point space issue from the 
origin of the coordinate system. Then every vector is the radius vector of 
some point of the space. The orthogonal transformation realized by the 
operator @ (or Q,) is a ‘rotation’ in this space, because it ‘preserves the 
euclidean metric and leaves the origin of the coordinate system fixed.** 
The symmetric operator S (or S,) represents a ‘dilatation’ of the n-dimen- 
sional space (i.e., a ‘stretching’ along mutually perpendicular directions 
with stretching factors 0,, @,,...,0, that are, in general, distinct (@,,@,,..., 
e, are arbitrary non-negative numbers) ). According to the formulas (114) 
and (115), every linear homogeneous transformation of an n-dimensional 
euclidean space can be obtained by carrying out in succession some rotation 
and ‘some dilatation (in any order). 


2. Just as was done in the preceding section for a unitary opérator, we 
now consider some representations of an orthogonal operator in a euclidean 


space RK. 
Let K be an arbitrary skew-symmetric operator (K* =— K) and let 
Q= ee. (116) 
Then Q is a proper orthogonal operator. For 
Q' = ek’ — eK — Q) 
and 
|@|=1" 


Let us show that every proper orthogonal operator is representable in 
the form (116). For this purpose we take the corresponding orthogonal 
‘matrix Q. Since | Q@ |=—1, we have, by (113) ,* 


38 For | Q@ | = 1 this is a proper rotation; but for | Q@ | =— 1 it is a combination of a 
rotation and a reflection in a coordinate plane. 
39 If Ia, ke, ..., kn are the characteristic values of K, then , =e, wy—e*,..., 
Un = en are the sheracteristic values of Q@ =e ; moreover 
Zu 
1Q | =pilte +s ty HE = 1, 
since a 
> k, =0 
{zl 
40 Among the characteristic values of a proper orthogonal] matrix Q there is an even 
—1 0 


number equal to —1. The diagonal matrix can be written in the form 


cos@ sil 
—sing cosp 


0 —1 
for p=N4X, 


288 IX. Lrvear Operators IN A UNITARY SPACE 
COBY, SiN Y, os Deer ae oa | Ka (117) 


? Ld 
| —sin gy, cos 9, 


(Q,=(Q1)1=Q,). 


We define the skew-symmetric matrix K by the equation 


Q=Q ! cos, sing, 
: Neen COS Y 


k=! 0%; © Pe ,0, 01. (118) 
—%, 0 —,0 |)’ ”. 
Since 
eg Poe 
exp = Z : 
—gp 0 —sing cosy 

it follows from (117) and (118) that 

Q= eX. (119) 


The matrix equation (119) implies the operator equation (116). 

In order to represent an improper orthogonal operator we introduce 
& special operator W which is defined in an orthonormal basis é1, é2,..., &n 
by the equations 


We, =e,,..., We,_,=e 


n-1» We, =— e,. (120) 
W is an improper orthogonal operator. If Q is an arbitrary improper 
orthogonal operator then W -'Q and QW —' are proper and therefore repre- 
sentable in the form e* and e*:, where K and K, are skew-symmetric opera- 
tors. Hence we obtain the formulas for an improper orthogonal operator 


Q= Wek =e W. (121) 


The basis e;, e2,..., e, in (120) can be chosen such that it coincides with 
the basis x, yx, 4, (kK=1,2,...,q;l=2q4+1,...,n) in (110) and (112) 
The operator W so defined is permutable with K ; therefore the two formulas 
(121) merge into one 


Q = Wer (W=wWwr= Ww-. K'—~K, WK=— KW). (122) 


Let us now turn to the Cayley formulas, which establish a connection 
between: orthogonal and skew-symmetric operators in a euclidean space. 
The formula 


§ 14. Potar DECOMPOSITION IN a EucbipEAN Space. CayLey’s Formunas 289 

Q =(E—K)(E+ Ky)", (123) 
as is easily verified, carries the skew-symmetric operator K into the orthogo- 
nal operator Q. (123) enables us to express K in terms of Q: 


K=(E—Q)(E+Q)". (124) 


The formulas (123) and (124) establish a one-to-one correspondence 
between the skew-symmetric operators and those orthogonal operators that 
do not have the characteristic value —1. Instead of (123) and (124) we 
can take the formulas 

Q=—(E—K)(E+ K)", (125) 
K=(E+Q)(E—@)". (126) 
In this case the number + 1 plays the role of the exceptional) value. 


3. The polar decomposition of a real matrix in accordance with Theorem 9 
enables us to obtain the fundamental formulas (107), (109), (111), and 
(113) without embedding the euclidean space in a unitary space, as was done 
above. This second approach to the fundamenta)] formulas is based on the 
following theorem : 


THEOREM 10: If two real normal mairices are similar, 
B=T-14T (4A'=A™A, BB'=B'B,A=A,B=8B), (127) 
then they are real-similar and orthogonally-similar : 
B=Q14Q (Q=Q=Q""). (128) 


Proof: Since the normal matrices A and B have the same characteristic 
values, there exists a polynomial g(A) (see 2. on p. 272) such that 


At=g(A), B'=9(B). 
Therefore the equation 
g(B) =T~*9(A)T, 


which is 8 consequence of (127), can be written as follows: 
BY=T“'A'T. (129) 
When we go over to the transposed matrices in this equation, we obtain : 
B=T'AT™-}, (130) 
comparison of (127) with (130) shows that 
TT'A= ATT". (131) 


290 TX. Linear Operators In a Unitary SPACE 
Now we make use of the polar decomposition of T: 
T=SQ, (1382) 


where S=yrT" =h(TT")(h(A) a polynomial) is symmetric and Q is real 
and orthogonal. Since A, by (131), is permutable with TT", it is also per- 
mutable with S=h(TT"™). Therefore, when we substitute the expression 
for T from (132) in (127), we have: 


B=Q'8"A8SQ =Q71AQ. 


This completes the proof. 
Let us consider the real canonical matrix 


Hy 7; By 
1 d 7 el Haier tdees (133) 
—Vy fy —V¥o By || 
The matrix (133) is normal and has the characteristic values 4, + 2,,..., 
Ug + 1%, Migety--- > My Since normal matrices are of simpJe structure, every 


norma] matrix having the same characteristic values is similar (and by 
Theorem 10 real-simiJar and orthogonally-similar) to the matrix (133). 
Thus we arrive at the formula (107). 

The formulas (109), (111), and (113) are obtained in exactly the same 
way. 


§ 15. Commuting Normal Operators 


In §10 we have shown that two commuting operators 4 and B in an 
n-dimensional unitary space R always have a common characteristic vector. 
By mathematical induction we can show that this statement is true not only 
for two, but for any finite number, of commuting operators. For given m 
pairwise commuting operators A;, Ao, .... Am the first m — 1 of which have 
a common characteristic vector x, by repeating verbatim the argument of 
Lemma 1 (p. 270) (for A we take any A; (t= 1, 2,..., m— 1) and for B we 
take A,,), we obtain a vector y which is a common characteristic vector of 
Ay, Az,..., Am. 

This statement is even true for an infinite set of commuting operators, 
because such a set can only contain a finite number (= n?) of linearly inde- 
pendent operators, and a common characteristic value of the latter is a 
common characteristic value of all the operators of the given set. 


2. Now suppose that an arbitrary finite or infinite set of pairwise com- 
muting normal operators A, B, C, ... is given. They all have a common 
characteristic vector x;. We denote by T, the (7 —1)-dimensional sub- 


§15. Commutina NorMAL OPERATORS 291 


space consisting of all vectors of R that are orthogonal to x,;. By § 10, 3. 
(p. 272), the subspace 7; is invariant with respect to A, B,C,.... There- 
fore all these operators have a common characteristic vector x2 in T;. We 
consider the orthogonal complement T? of the plane {x,, x2] and select in it a 
vector x3, etc. Thus we obtain an orthogonal system x, x2,..., X, of com- 
mon characteristic vectors of A,B, C,.... These vectors can be normalized. 
Hence we have proved : 


THEOREM 11: If a fenite or infinite set of pairwise commuting normal 
operators A, B,C,...ina unitary space R is given, then all these operators 
have a complete orthonormal system of common characteristic vectors 
31) 32, eoey Bn? 


Az,=1,2;, Ba,=1z,, Ca,= 1/2, ... [(2,%,) =by; t, k =1,2,..., 0]. (134) 


In matrix form, this theorem reads as follows: 

THEOREM 11’: If a finite or infinite set of pairwise commuting normal 
matrices A, B, C,... 18 given, then all these matrices can be carried by one 
and the same unitary transformation into diagonal form, 1.e., there exists a 
umtary matriz U such that 


A=U{d4,,..,A)}0"°, B=U(%,..., Mo", | 


a ” ” 1 — ye 1 _ (135) 
C=U (2, ..., 0... (U=Ue. 


Now suppose that commuting normal operators in a euclidean space R 
are given. We denote by A, B, C, ... the linearly independent ones among 
them (their number is finite). We embed R (under preservation of the 
metric) in a unitary space R, as was done in §]3. Then by Theorem 11, the 
operators A, B, C,... have a complete orthonormal system of common char- 
acteristic vectors 2}, Zo,..., S, in R, Le., (184) is satisfied. 

We consider an arbitrary linear combination of A, B,C,...: 


P=aA+ pB+yC +->>. 


For arbitrary real valtes a, 8, y,... P is a-real (PRC R) normal operator 
in R and 


Ps, = A,s;, aren eee ‘ia 
[(4,%,)=5,; 9, k=1, 2, ..., n]. 
The characteristic values A, (j= 1, 2,..., ) of P are linear forms in 
a, B, y,.... Sinee P is real, these forms can be spht in pairs of complex 


conjugates and real ones; with a suitable numbering of the characteristic 
vectors, we have 


292 TX. Linear Operators IN a UNITARY SPACE 


(A=1, 2, ...,q; $=2¢4+1, ..., n), 
where M,, N;, and M, are real linear forms in a, 8, y,.... 
We may assume that in (136) the corresponding vectors %2,_; and ox are 
complex conjugates, and the 2 real: 


Bop y= B+ yy, Fy —%y—ly,, 7, = 2%, (138) 
(k=1, 2,...,q; t=2¢4+1, :.., 2). 


But then, as is easy to see, the real vectors 
ad Ye» ad (k=1, 2, coe G5 t=2¢+1, cosy n) (139) 


form an orthonormal basis of R. In this canonical basis we have :*! 


Py,= Nx, + iy, (140) 


Px,= M,x,—N,y; > k=1i, 2, wees q; 
( , 
Px, = M x, 


t=2¢+ 1, ...,; 


Since all the operators of the given set are obtained from P for special 
values of a, 8, y,... the basis (139), which does not depend on these parame- 
ters, is a common canonical basis for all the operators. Thus we have proved: 


THEOREM 12: If an arbitrary set of commuting normal linear operators 
in a euclidean space R is given, then all these operators have a common 
orthonormal canonical basis Xx, Yr, %1: 


Ax, = X,— Ye, Bx, = 1%, — "Ye, 0+ 
Ay, =%,%,4+ MEY; By, =X, + lGY er os (141) 
Ax, = "%,; . Bx, = [}%,, .... 


We give the matrix form of Theorem 12: 

THEOREM 12’: Every set of commuting normal real matrices A, B,C,... 
can be carried by one and the same real orthogonal transformation Q into 
canonical form 


My Mg %, “ 
A= g: pkey gs | ; 5 f - 
a a i 1 fy Mog +1 bed ho : 
yy Hy %{| , i (es (142) 
B=9| 1 oof ; = ‘ : 
Via tet af 2g+1 nt ® 


ee @© @© e¢ @ @¢€ @ @ @ ¢ @ @ ee @ se ee 6© @ © e@© &® 8 @#© @ © @  @  %@®  @ oe # 
. 


41 The equation (140) follows from (136), (137), and (138). 


§ 15. Commuting NormaL OPERATORS 293 


Note. If one of the operators 4, B,€,... (matrices A, B,C,...)—say A 
(_4 )—is symmetric, then in the corresponding formulas (141) ((142)) all 
the v are zero. In the case of skew-symmetry, all the w are zero. In the case 
where A is an orthogonal operator (A an orthogonal matrix), we have 
4,7 COs 9, = sin g,, = +1 (k =I, 2,...,¢g;,;l=2¢t+1,..., 2). 


CHAPTER X 


QUADRATIC AND HERMITIAN FORMS 


§ 1. Transformation of the Variables in a Quadratic Form 


1. A quadratic form is a homogeneous polynomial of the second degree in 
nm variables 21, Zo, ..., Zn. A quadratic form always has a representation 


ni 
> UatiXs (a,,=a,;; 3, K=1, 2, ..., ), 
ike 


where A = | Osx \|2 is a symmetric matrix. 
If we denote the column matrix (21, 22, ..., Zn) by x and denote the 


. quadratic form by 


n 
A(z, 2) = ~. Oy XL, (1) 
then we can write :3 
A(z, x)=2'Az. (2) 


If A= || a, |{ is a real symmetric matrix, then the form (1) is called 
real. In this chapter we shal] mainly be concerned with real quadratic forms. 

The determinant | A | =| au |1 is called the discriminant of the quadratic 
form A(z,xz). The form is called singular if its discriminant 1s zero. 

To every quadratic form there corresponds a bilinear form 


A (x, y) ~~ ere (3) 

or 
A(z, y)=aAy (© =(%, ---, Bq), Y=(Yr> -++> Yn))> (4) 
If x}, x, ..., a, y}, y?, .... y™ are column matrices and ¢}, C2, ..., C1, 


dy, do,..., dm are scalars, then by the bilinearity of A(z,y) (see (4)), 


1The sign * denotes transposition. In (2) the quadratic form is represented as a 
product of three matrices: the row x‘, the square matrix A, and the column z. 


294. 


§ 1. TRANSFORMATION OF VARIABLES IN QuapRATIC ForM 295 


q m q m 
A (> o,2!, 2 djyl) = 2 Pp» od,A (x', yi). (5) 
t= j= j=Lj= 


If A is an operator in an n-dimensional euclidean space and if in some 
ofthonormal basis e;, @2, ..., @, this symmetric operator corresponds to 
the matrix A= | dix 5 , then for arbitrary vectors 


nr n 
x =)' x4, * f => Ye; 
{=1 a? | 


we have the identity? 


A(x, y) = (Ax, y) = (x, Ay). 
In particular, 
A(x, 2) = (Ax, x) = (x, Ax), 
where 
Ayn, = (Ae, ex) (i,k =1,2,...,%). 


2. Let us see how the coefficient matrix of the form changes under a trans- 
formation of the variables: 


m= d tabs ((=1,2,...,n)0 °° (6) 


In matrix notation, this transformation looks as follows: 
r= Té. (6”) 


Here z, € are column matrices: x = (7), Zo,...,2%n) and & = (&1, &2,..., En); 
and T is the transforming matrix: T = | tix |? 
Substituting the expression for z in (2), we obtain from (6’): 


A(a, 2) =E™TTATE =ETAE= A (E, 8), 
where 
A=T'TAT. (7) 


The formula (7) expresses the coefficient matrix A = || ix ii of the 


2? 


transformed form A (€,é)= ma a, &,&, in terms of the coefficient matrix 
=A 


of the original form A = || Pi i i and the transformation matrix T= | tir ik a 
It follows from (7) that under a transformation the discriminant of the 
form is multiplied by the square of the determinant of the transformation : 


2In A(x, y), the parentheses form part of the notation; in (A4x,y) and (x, Ay), they 
denote the sealar product. 


296 X. QUADRATIC AND HERMITIAN Forms 


[A|=|A||7?- (8, 

In what follows we shall make use exclusively of non-singular transfor- 

mations of the variables (| 7|0). Under snch transformations, as is 

clear from (7), the rank of the coefficient matrix remains unchanged (the 

rank of A is the same as that of A). The rank of the coefficient matrix is 
usually called the rank of the quadratic form. 


DEFINITION 1: Two symmetric matrices A and A connected asin formula 
(7), with | T| 0, are called congruent. 


Thus, a whole class of congruent symmetric matrices is associated with 
every quadratic form. As mentioned above, all these matrices have one and 
the same rank, the rank of the form. The rank is an invariant for the given 
class of matrices. In the real case, a second invariant is the so-called ‘signa- 
ture’ of the quadratic form. We shall now proceed to introduce this concept. 


§ 2. Reduction of a Quadratic Form to a Sum of Squares. 
The Law of Inertia 


1. A real quadratic form A(z, x) can be represented in an infinite number 
of ways in the form 


A (x, 2) = 3'a,X?, (9) 
t=] 


where a, 0 (c= 1, 2,...., 7) and 
X, =D) a2, (#=1,2,...,7) 
k=l 


are linearly independent real linear forms in the variables .r,, 22, .... L, (SO 
that r= n). 

Let us consider a non-singular transformation of the variables under 
which the first » of the new variables £,, é., ..., &, are connected with 
1, Xe,..., 2 by the formulas‘ 


&,= X, (=1, 2, red | 


Then, in the new variables, 


3 See p. 17. 
+ We obtain the necessary transformation by adjoining to the system of linear forms 
Xi, ..., X, such linear forms X,41, ..., Xn that the forms X; (j==1,2....,n) are 


linearly independent and then setting £;—=X,; (j=1,2,...,n). 


§ 2. Repuction To Sum or Squares. Law or INERTIA 297 


A (x, 2)= A (, ) =2' a,f? 


and therefore A = {@, @2,...,@,,0,...,0}. But the rank of Aisr. Hence: 
The number of squares in the yepressniation (9) 1s always equal to the rank 
of the form. 


2. We shall show that not only is the total number of squares invariant in 
the various representations of A(z,z) in the form (9), but also so is the 
number of positive’ (and, hence, the number of negative) squares. 


THEOREM 1 (The Law of Inertia for Quadratic Forms): In a repre- 
sentation of a real quadratic form A(x, z) as a sum of independent squares® 


A (2, yee Ya,X!, (9) 


tx] 


the number of positive and the number of negative squares are independent 
of the choice of the representation. 


Proof. Let us assume that we have, in addition to (9), another repre- 
sentation of A(z, x) in the form of a sum of independent squares 


A (x, 2) = 3'b,Y? 
im 
and that 
a, >0,a>0,..., a4, > 0, 741 << 0,...,a <0, 
b; > 0, b2 > 0,..., bn > O, Dang 1 << 0,...,0, < 0, 


Suppose that g #h, say g << h. Then in the identity 


34,3} PL! ¥} (10) 
t=1 
we give to the variables 21, 72,...,2, values that satisfy the system of 
r— (h—g) equations 


X,=0, X,=0,...,X,=0, ¥,,,=0,...,¥,=0, (22) 


5 By the number of positive (negative) squares in (9) we mean the number of positive 
(or negative) a. 

6 By a sum of independent squares we mean-a sum of the form (9) in which all 
a, ~* 0 and the forms Xi, Xs, ..., X+r are linearly independent. 


298 X. QUADRATIC AND HERMITIAN Forms 


and for which at least one of the forms X, .1,...,¥, does not vanish.” For 
these valnes of the variables the left-hand side of the identity is 


a» 4;X?<0, 
j=gtl 


and the right-hand side is 


A 
> 6,Y?>0. 
k=1 


Thus, the assumption g 5h has led to a contradiction, and the theorem 
is proved. 

DEFINITION 2: The difference o between the number xn of positive 
squares and the number v of negative squares in the representation of A(x. 7) 
is called the signature of the form A(x,xr). (Notation: c=o[A(zr,7)]). 

The rank r and the signature o determine the numbers 2 and v uniquely, 
since 


r=-7e+Y, CGH H—Y. 


Note that in (9) the positive factor \/: a; | can be absorbed into the form 
X, (¢=1,2,...,7r). Then (9) assumes the form 


A(x, 2) =X? 4 X24...4 X2— 2... — X?, (12) 

Setting® &;=X; (s=1.2,.... r), we reduce A(x, zx) to the canonical 
form 

A(E =H B4...4 2B ¥. (13) 


Hence we deduce from Theorem 1 that: Every real symmetric matrix A ts 
congruent to a diagonal matrix in which the dragonal elements are +1, — 1, 
or 0: 


A=T7'({+1,..5+1,—1,...,—10,...,O}T7. (14) 


In the next section we shall give a rule for determining the signature 
from the coefficients of the quadratic form. 


7 Such values exist, since otherwise the equations X741— 0...., X, = 0 and hence all 
the equations X,= 0, X¥,=—0,..., X-=0O would be consequences of the r— (kh —ig) 
equations (11). This is impossible, because the linear forms Xi, X2, ..., Xp are inde- 
pendent. 


8 See footnote 4. 


§ 3. Mertuops or LAGRANGE AND JACOBI 299 


§3. The Methods of Lagrange and Jacobi of Reducing a Quadratic 
Form to a Sum of Squares 


It follows from the preceding section that in order to determine the rank 
and the signature of a form it is sufficient to reduce it in any way to a sum 
of independent squares. 

We shall deseribe here two reduction methods: that of Lagrange and 
that of Jacobi. 


1. Lagrange’s Method. :Let a quadratic form 


A (x, 2) =D? a,2,2, 
t, kel 
be given. 
We consider two cases: 


1) For some g (lg <= 7) the diagonal coefficient a,, is not equal to 
zero. Then we set 


A (x, x)= =i (Ze ae + A, (2, 2) (15) 


and convince ourselves by direct verification that the quadratic form 
A,(2z,x) does not contain the variable z,. This method of separating out a 
square form in a quadratic form is always applicable when there is a non- 
zero diagonal element in the matrix A= || ay ||7. 


2) Qg=— 0 and a,,= 0, but a,,540. Then we set: 


1 . 2 1 = a 
A (x, x)= Say, |S (Gq -+ Any) x, ~ Bang > (Gp, -— Bry) z,| + Ag (x, 2). (16) 
The forms 


Se By My » p> Any hy (17) 


are linearly independent, since the first contains x, but not «,, and the 
second contains x, but not z,. Therefore, in (16), the forms within the 
brackets are linearly independent (as sum and difference, respectively, of 
the independent linear forms (17)). 

Therefore we have separated out two independent squares in A(.x,.). 
Each of these squares contains z, and 2, whereas Ao2(x, x) does not contain 
these variables, as 1s easy to verify. 


300 X. QuaDRATIC AND HERMITIAN ForMS 


By successive application of a combination of the methods 1) and 2), 
we can always reduce the form A(z, xz) by means of rational operations to 
a sum of squares. Moreover, the squares so obtained are linearly independ- 
ent, since at each stage the square that is separated out contains an unknown 
that does not occur in the subsequent squares. 

Note that the basic formulas (15) and (16) can be written as follows 


A(x, 2) = 7 Ge) + A, (2, 2), (18’) 
Als 2)= ga & 3) — —(e— sa) \)+ 4s (x, 2). (16’) 


Example. 
A (xz, x) = 423 + 22 + 28 4+ 7? —4 2,2, — 44,75 + 422% + 42,%3 — 42,2. 
We apply formula (15’) with g=1: 
1 
A (x, x) = 16 (8x, — 427, —4z, + 42,)° + A, (z, x) 
= (22, — %—~ % + %)* + A, (Z, 2%), 


where 


A, (x, x) = 22,73 — 22,4, + 2x37, . 
We apply formula (16’) with g=2 @ndh=3: 
A, (x, x)= ~ (22, + 225)? ~> (223 — 2%, —42,)* + A, (2, 2) 


= z(t 20 m)*—> (%y— %, — 2%)" + A, (2, 2) 


where 
A, (%, 2) = 22}. 


Finally 


] 1 
A (2, ©) = (2%, — %,— %y + 2)? + > (% + %5)" — > (%s— 2, — 2%)? + 224, 


=—4, ¢o=2. 


2. J acobt’ s Method. We denote the rank of A(z, x) = 3' ay.x,2, by r and 
ibm 


assume that 
k . 
)*0 (k=1,2, ..., 7). 


Then the symmetric matrix 4 = 1 Aix I|z ean be reduced to the form 


§ 3. MerHops or LAGRANGE AND JACOBI 304 


Jin Gras - +e Jin 
0 Jes ee AE aS SES Jon 
G=||0 O ...g,... gp (18) 
: 0 0. 0 0 


= e@ 8 @ ee e ee e * «« ee 


by Gauss’s elimination algorithm (see Chapter II, §1). 
The elements of G@ are expressed in teruis of the elements of A by the 
well-known formulas® 


1 2...p—l1 p 
Ea eee eee 


re) Or re os 1 (q=p,pt+l,....n; p=1,2,..., 7), (19) 
(; ae Saat 
In particular, 
__ D = a ee 
I= Don (p=1, 2, <28'3 rT; Dy= 1). (20) 


In Chapter II, § 4 (formula (55) on page 41) we have shown that 


mNN 
A=G@'D6G, (21) 
“N 
where D is the diagonal] matrix: 
“~ 1 Dy D,-1 1 1 1 
Da {por Be os EO, Off ee | aa \. 22 
D,’ D, D, ; Ji. 922 Trp < (22) 


Without infringing (21) we may replace some of the zeros in the last 
m—r rows of G@ by arbitrary elements. By such a replacement we can 
make G@ into a non-singular upper triangular matrix 


9ir Fyn + +e - ee Iin 
O Ge -++--- Jon 
T= |10 ° 0... Gy 20s Im (| Z| 540). (23) 


ee ee e © © © &© © ee ef ee 8 @ 


® See Chapter II, § 2. 


302 X. QUADRATIC AND Hermitiay ForMs 
The equation (21) can then be rewritten : 
A=T DT. (24) 


From this equation it follows that the quadratic form’ 


a oD ps OE 


(€= (;, é, ee seals Dy= 1) 
goes over into the form A(z, xz) under the transformation 


E=Tx 
Since 


p= Xp, Xe=Iee%e + Yeerr%es1 t+ + Yann a 


we have Jacobi’s Formula 


* De_y -.¢ 
A(z, z)= a | == ==]). 
oa a x, kay Ike (Do=1) ”) 


This formula gives a representation of A(.x,z) in the form of a sum of 
r independent squares.!? 
Jaeobi’s formula is often given in another form. 


Inst-ad of X;, (k= 1, 2,..., 7), the linearly independent forms 
Y,—=D,_ 1X; (k= 1, 2,...,7; Dg=1) . (27) 
are introduced. Then Jacobi’s formula (26) can ve written as: 
| yee (28) 
A(z, = 2 DD: 
Here 
Vy Cue + apiMerr tet ot emt = (k=1,2,...,7) (29) 
where 
10 We regard Die, €)as a quadratic form in the 2 variables &,, &,...,5n- 


12 Another approach to Jacobi'’s formula, which does not depend on (21), can be 
found, for example, in {17], pp. 43-44. 

12'The independence of the squares in Jacobi’s formula follows from the fact that 
the form A(z,x) ig of rank r. But we ean also convince ourselves directly of the inde- 

D 
pendence of the forms X,, X:,...,X,. For, aceording to (20), gxx = ie ~ 0 and there- 
' fos | 

fore X, contains the variable x,, whch does not occur in the forms X.+1,..., X- (k= 1, 2, 
3,...,7). Henee X:, Xo, ..., Xe ar? linearly independent foy-,s. 


§ 3. MetrHops or LAGRANGE AND JACOBI 303 


12...k—lk 
C,,7> A =k,k+1,...,n; k=1,2,.:.,r). (30 
a ( 2...k—1 ' (7 be 
Ezample. 


A(z, x) = x2 + 3223 —323— 42,2, + 22,73 — 221%, — 6 xq25 + 8x,% + Zagr,. 


We reduce the matrix 


1 —2 1 —1 
—2 3 —3 4 


A=) 1-3 0 1 
—l 4 1 —3 
to the Gaussian form 
1 —2 1 —1 
0o—1 —I!1 2 
e=l0 0 Oo 
0 0 0 0 


Hence r= 2, g1u1 — 1, 922 -—- — 1. 
Jacobi’s formula (26) yields: 


A (x, 2) = (4%,—2 2% + %y— %)* — (— 4% — 2g 4 22,)*. 


Jacobi’s formula (28) yields the following theorem: 


THEOREM 2 (Jacobi). Jf for the quadratic form 
A(x, 2) = S'agz,% 
i,k= 1 


of rank r the inequality 


1 2...k 


x (k=1,2,...,7), (31) 


D,=A 
. (, 2...k 


holds, then the number a of positive squares and the number v of negative 
squares of A(x, x) coucide, respectively, with the number P of permanences 
of sign and the number V of variations of sign in the sequence 


1, D1, De,..., Dy, (32) 
1.€., t= P(1, Dy, Do,...,D,-), v= V1, Di, De,..., D-), and the signature 
o =r—2V(1, Di, Do,...,D,). (33) 


304 X. QUADRATIC AND HERMITIAN ForMS 


Notel. If ithe sequence 1, D,,..., D, #0 there are zeros, but not three 
in succession, then the signature can be determined by the use of the formula 


o—r—2V7 (I, D,, D 9 e049 D,) 
omitting the zero D, provided Dy—1Di41 40, and setting 


Di+s 
1, when Di. <0, 


V (Dy_1) D,, Dus D, +2) = (34) 


2, when pet 0 
kt 


af Dy = Duy =0. 

We state this rule without proof." 

Note 2. When three consecutive zeros occur in Dy, Do, ..., Dr—1, then 
the signature of the quadratic form cannot be immedtately determined by 


Jacobi’s Theorem. In this ease, the signs of the non-zero D, do not determine 
the signature of the form. This is shown by the foliowing example: 


A(x, 2) = 20,2, 2%, + A272 + agx3 (A242, 5£0). 
Here | 
D,=D,=D,=0, D,=—aja,0,%0. 
But . 
2s) 1, when az > 0, as > 0, 
| 3, when a2 < 0, as; < 0. 


In both cases, D, < 0. 
Note 3. If D, ~0,....D,_, 30, but D,=0, then the signs of Di, De. 
., D,—1 do not determine the signature of the form. As a corroborating 
example, we can take the form 


ax? + ax + bxj + 2ax, 7, + Zaxyry + 2ax,y =a (aT, + TZ, + Zy)* + (6 —a) a 
§ 4. Positive Quadratic Forms 


1. In this section we deal with the special, but important, class of positive 
quadratic forms. 


baton 
DeFiniTION 3: A real quadratic form A(x,x) = )/ Anti, 18 called 


t, ken] 
posttine (negative) semidefinite tf for arbitrary real values of the variables : 


A(z,z) =0 (0). (35) 


13 This rule was found in the case of a single zero D, by Gundenfinger and for two 
suecessive zeros D: by Frobenius [162]. 


§ 4. PositrvE QuapRATIC Forms 305 


DEFINITION 4: A real quadratic form A(x, x)= D} anayx, is called 
$,kum1 


positive (negative) definite if for arbitrary values of the variables, not all 
zero, (4 £0) 
A(z,z) >0 (<0). (36) 


The class of positive (negative) definite forms is part of the class of 
positive (negative) semidefinite forms. 

Let A(z,2z) be a positive-semidefinite form. We represent it in the 
form of a sum of linearly independent squares: 


A (2, 2) = 3" a,X?, (37) 


foi 
In this representation, all the squares must be positive: 
a;>0 (s=1, 2,..., 7). (38) 


For if any a; were negative, then we could select values of 21, 79,.... 0p 
for which 
Xp HX HK SH HX, =9, XA. 


But then A(z,2z) would have a negative value for these values of the vari- 
ables, and by assumption this is impossible. It is clear that, conversely, it 
follows from (37) and (38) that the form A(z, x) is positive semidefinite. 

Thus, a positive semidefinite quadratic form is characterized by the equa- 
tions o=r (r=r, v=0). 

Now let A(z, x) be a positive-definite form. Then A(z, z) is also posi- 
tive semidefinite. Therefore it is representable in the form (37), where all 
the a, (t= 1, 2,...,7) are positive. From the positive definiteness it follows 
thatr=n. For if r< n, we could find values of 21, £o,..., Z,, not ali zero, 
such that al] the X; would be zero. But then by (37) A(z,xz) =0 for r Koa. 
and this contradicts (36). 

It is easy to see that, conversely, if in (37) r= and all the a;, a2,..., Gn 
are positive, then A(z, z) is a positive-definite form. 

In other words: A posttive-semnmdefimte form is positive definite if and 
only if it 1s not singular. 


2. The following theorem gives a criterion for positive definiteness in the 
form of inequalities which the coefficients of the form must satisfv. We 
shall use the notation of the preceding section for the sequence of the prin- 
cipal minors of A: 


306 X. QuADRATIC AND HERMITIAN ForMS 


| 241 Bigeye 
a a 
ts Bee ead & 12 a Apo -.-- a 
D,= 4%; D, = ? eouey D,= 21 22 2n ‘ 
Qo, Aoe eo © «© & © © © © @© @ 
Qn ano : Ong 


THEOREM 3: A quadratic forr. is positive definite if and only rf 
Di >, De >0,..., Da, > 0. (39) 


Proof. The sufficiency of the conditions (39) follows immediately from 
Jacobi’s formula (28). The necessity of (39) is established as follows. 


a 
From the fact that A(z, z) = 2 Aix~.Z;L, 18 positive definite, it follows that 
° wl 


the ‘restricted’ forms** 
p 
A, (%, 2) = 2) Uy t ih, (p=1, 2, ..., ”) 
' (kel 


are also positive definite. But then all these forms must be non singular, 1.€., 
D,=| A, | 30 (p=1,2,...,). 

We are now in a position to apply Jacobi’s formula (28) (for r=). 
Since all the squares on the right-hand side of the formula must be positive, 
we have 

D,>0, D,D,>0, DyD,>0, ..., D,_,D,>0. 


Hence the inequality (39) follows, and the theorem is proved. 


Since every principal minor of A can be brought into the top left corner 


by a suitable numbering of the variables, we have the 
n 


CoroLuary: In a positive-definite quadratic form A(z, x) = >) UnL4 Lr, 


k= 
all the principal minors of the coefficient matrix are positive :*° 
A fs shi 'r) 0 (1 <i, <i,<-+-<i, cn; p=1,2,..., 0). 
dy 8g ee by 
Note. If the successive principal minors are non-negative, 


D,=0,D.=0,..., Da = 0, (40) 


14 The form A,(xz,2Z) is obtained from dA(2,xz) if we set in the latter 
Zppy— ses =X, =— 0 (P= 1,2,...,2). 


15 Thus, when the successive principal minors of a real symmetric matrix are positive, 
all the remaining principal minors are then also pos ‘five. 


§ 4. PosITIvE QuapRatic Forms 307 
it does not follow that A(z,z) is positive semidefinite. For, the form 
yy Xj + WoT Le + Agee, 


in which a3; = a}. = 0, doz < 0 satisfies (40), but is not positive semidefinite. 


However, we have the following theorem. 
n 


THEOREM 4: <A quadratic form A(z, x) = Dd) ant, is positive semi- 
bal 


definite if and only if all the principal minors of its coefficient matrix are 
non-negative : 


1°72 °*° p : ; 


Proof. We introduce the auxiliary form 
A,(z,z)=A(z,z)+e S' at  (e<0). 
tan] 
Obviously lim A, (2,2) = A(z, 2x). 
e—-0O 


The fact that A(z, 2z) is positive semidefinite implies that A,(z,2z) is 
positive definite, so that we have the inequality (cf. Corollary to Theorem 3) : 


A, (8 ie  \>o (lSi, <i, <++s <i, sn; p=1,2,..., 0). 


44 te eee ty 


Proceeding to the limit for e— 0, we obtain (41). 
Suppose, conversely, that (41) holds. Then we have 


4,{ 8 _ |r ro BESO (lS <tg<-++<t,5n;p=l,2, ...,m). 
tte... t, 


But then (by Theorem 3), A, (2x, x) is positive definite 
A,(z,2z)>0 (x50). 
Proceeding to the limit for e — 0 we obtain: 
A(z, xz)>0. 


This completes the proof. 


The conditions for a form to be negative semidefinite and negative defi- 
nite are obtained from (39) and (41), respectively, when these inequalities 
are applied to — A(z,.z). 


308 X. QUADRATIC AND HERMITIAN ForMS 


TuroreM 5: A quadratic form A(x, zx) is negative definite if and only 
if the following mequalitres hold : 


D, < 0, De > 0, Ds < 0,...., (—1)"D, > 0. (42) 
THEOREM 6: A quadratic form A(x,2) 1s negative semidefinite rf and 


only if the following inequalities hold : 


(— pa (is 20 (lSij<tg<---<i,sn; p=], 2,...,%). (43) 
149 oe by 


§ 5. Reduction of a Quadratic Form to Principa Axes 


1. We consider an arbitrary real quadratic form 


R 
A (x, 2) = 3) ayxmy. - 
t, kul 


Its coefficient matrix A= | Ayy, II is real and symmetric. Therefore 
(see Chapter IX, § 13) it is orthogonally similar to a real diagonal matrix A, 
i.e., there exists a rea] orthogonal matrix Q such that 


A=Q'4Q (A=|lA6a\2, GQ" = 2). (44) 


Here Ay, Ao, ..., A, are the characteristic values of A. 
Since for an orthogonal matrix Q-1=@Q", it follows from (43) that 
_ under the orthogonal transformation of the variables 


z=QE (QQ°=£) (45) 
or, in greater detail, 
z; = 2 Gab (3 ct = 843 E=L2, 8), (45’) 
= . \iw 


the form A(z, Z) goes over into 


A(E, &) = Saget: (46 


fuel 


' §5. RepuctTIon To PrinciPaL AXES 309 


n 
THEOREM 7: Every real quadratic form A(z, x) Py Ay, TL, can be 
=1 


reduced to the canonical form (46) by an orthogonal transformation, where 
Ay, Ao, ..., An are the characteristic values of A= | Aux \l 


The reduction of the quadratic form A(z, z) to the canonical form (46) 
is called reduction to principal axes. The reason for this name is that the 
equation of a central hypersurface of the second order 


oo, Os. 24 Le =c (c = const. 0) (47) 


“4d, k=l 


under the orthogonal transformation (45’) of the variables assumes the 
canonical form 


Ll a . A, 
ye t=) (= 4; é,=+1; i=1,2,...,0}. (48) 
inl YG a; 
If we regard 2, Zo, ..., Zn aS Coordinates in an orthonormal! basis in an 


n-dimensional euclidean space, then &,, &2,..., &, are the coordinates in a 
new orthonormal basis of the same space, and the ‘rotation’’® of the axes 
is brought about by the orthogonal transformation (45): The new coordi- 
nate axes are axes of symmetry of the central surface (47) and are usually 
called its principal azes. 


2. It follows from (46) that the rank r of A(a, x) 1s equal to the number of 
non-zero characteristic values of A and the signature o is equal to the def fer- 
ence between the number of positive and the number of negative character- 
istic values of A. 


Hence, in particular, we have the following proposition : 


If under a continuous change of the coefficients of a quadratic form the 
rank remains unchanged, then the signature also remains unchanged. 


Here we have started from the fact that a continuous change of the 
coefficients produces a continuous change of the characteristic values. The 
signature can only change when some characteristic value changes sign. 
But then at some intermediate stage this characteristic value must pass 
through zero, and this results in a change of the rank of the form. 


16Tf |Q|——1, then (45) is a combination of a rotation with a reflection (see 
p. 287). However, the reduction to principal] axes can always be effected by a proper 
orthogonal matrix (|@|—=1). This follows from the fact that, without changing the 
canonical form, we can perform the additional transformation 


&=€ @=1,29,...,.n-1), §&=-&. 


310 X. QUADRATIC AND HERMITIAN Forms 


§ 6. Pencils of Quadratic Forms 


1. In the theory of small oscillations it is necessary to consider simul- 
taneously two quadratic forms one of which gives the potential, and the 
other the kinetic energy, of the system. The second form is always positive 
definite. 
The study of a system of two such forms is the object of this section. 
Two real quadratic forms 


t, kel 


A(z, 2)= S a,%,2, and B(x, x)= DS byrx 
ik—l 


determine the pencil of forms A(z, x)—AB(az,z) (A is a parameter). 

If the form B(z, x) is positive definite, the pencil A(z, x) —AB(z, r) 
is then called regular. 

The equation 


| A— AB |=0 


is called the characteristic equation of the pencil of forms A(xz,z)—AB(z,z). 
‘We denote by /, some root of this equation. Since the matrix A — A,B 
is singular, there exists a column z= (2), 22, ..., 2n) #O such that 
(A —A,B)z= 0, or 
Az=A,Bz (20). 


The number A, will be called a characteristic value of the pencil 
A(x, x)—AB(z,x) and z a corresponding principal column or ‘principal 
vector’ of the pencil. The following theorem holds: 


THEOREM 8: The characteristic equation 
| A— AB |=0 


of a regular pencil of forms A(x,x)—AB(z,z) always has n real roots 
4, with the corresponding principal vectors 2*= (Zin, Zor ---, 2nk) 
(k= 1, 2, wee y MH): 


Azk=A,Bzeh (k= 1,2,..., n). (49) 
These principal vectors z* can be chosen such that the relations 


B(zi,zt)=de (4, k= 1,2,..., 0) (50: 
are satisfied. 


§ 6. Prencims or Quapratic Forms. 311 


Proof. We observe that (49) can be written as: 
B-1Azk=A,2* (k=1, 2,..., 0). (51) 
Thus, our theorem states that the matrix 
D=B1A (52) 


1. has simple structure, 2. has real characteristic values, and 3. has charac- 
teristic columns (vectors) z}, z?,..., 2" corresponding to these characteristic 
values and satisfying the relations (50) .*" 

In order to prove these three statements, we introduce an n-dimensional 
vector space RK over the field of real numbers. In this space we fix a basis 
€1, @2,..., @, and introduce a scalar product of two arbitrary vectors 


n " 
*x =) ue; Y = Di y,e, 


by means of the positive-definite bilinear form B(x, y) : 
(xy) =B(z, y) = DS) ba xy, = x" By (53) 
t,kml 


and hence the square of the length of a vector x by means of the form B(z, z) : 
(xx) = B(x, ~)= 2" Bz, (53) 


where x and y are columns x = (24, Ze, ..., Zn), Y= (Y1, Yo, -- +) Yn): 

It 1s easy to verify that the metric so introduced satisfies the postulates 
1.-5. (p. 248) and is, therefore, euclidean. 

We have obtained an n-dimensional euclidean space R, but the original 
basis €;, @2,..., €, 1S, in general, not orthonormal. To the matrices A, B, and 
D = B~'A there correspond in this basis linear operators in R: A B, and 
D=B-1A."8 


17 [f D were a symmetric matrix, then the properties 1. and 2. would follow immediately 
from properties of a symmetric operator (Chapter IX, p. 284). However, D, as a product 
of two symmetric matrices, is not necessarily itself symmetric, since D— B-14 and 
DT = AB-—}, 

18 Sinee the basis e1, es, ..., @n is not orthonormal, the operators A and B to which, 
in this basis, the symmetric matrices 4 and B correspond, are not necessarily symmetric 
themselves. 


312 


X. QUADRATIC AND HERMITIAN FORMS 
§ 13).19 


We shall show that D is a symmetric operator in R (see Chapter IX, 
L= (£1, £2, 


Indeed, for arbitrary vectors x and y with the coordinate columns 
oo 8 4 Ln) and y= (Y1, Yo, 


Yn) we have, by (52) and (53), 
(Dx, y)=(Dz)' By= 2D’ By=x"AB' By=x"Ay 
and 


(x, Dy) =2"'BDy=2" BB Ay=<x"Ay, 
1.e., 


(Dx, y)=(x, Dy). 

The symmetric operator D= B-!A has real characteristic values A,, ds. 
As, .- 

ws, .., 


., A, and a complete orthonormal system of characteristic vectors 2}, 2°, 
, 2" (see p. 284, Chapter IX) : 


B?As*=A,s¢ = (k=, 2, ..., 0), 
(a'a*)— 6, (s, k=1, 2, ..., n) 

Let on = (21x, e2ks 

in the basis e,, ey, 


(54”) 
.,2nx) be the coordinate column of 2* (k = 1, 2, 
slugs Map: 


aay) 

Then the equations (54) can be written in the 

form (51) or (49) and the relations (54’), by (53), yield the equation (50). 
This completes the proof. 


Note that it follows from (50) that the columns 
independent. For suppose that 


yy ™ a 2 te we 


2” are linearly 
n 

> ot =0. 

keel 


(55) 
Then for every 1 (11 7n), by (50), 


Then all the c; (= 1, 2,..., 2) in (55) are zero and there is no linear depend- 
ence among the columns 2}, z?,... , 2”. 

A square matrix formed from principal columns 2!, z?, 
the relations (50) 


..., 2 satisfying 
Z=(2, 2, ..., 2) =[leu lt 
will be called a principal matrix for the pencil of forms A(z, x) —AB(x, 2). 


19 Hence D is similar to some symmetric matrix. 


§ 6. PrEncrius or QuapRATIC ForMs 313 


The principal matrix Z is non-singular (| Z | 4 0), because its columns are 
linearly independent. 
The equation (50) can be written as follows: 


‘ 2 Bet=6, (i, k=1, 2,..., 0). (56) 


Moreover, when we multiply both sides of (49) on the left by the row matrix 
2’, we obtain: 


2 Azk=Az! Be=16, (i, k=1, 2, ..., 2). (57) 


By introducing the principal matrix Z = (z!, 27, ..., 2%), we can repre- 
sent (56) and (57) in the form 


ZAZ=\|Asalt, ZBZ=2. (58) 
The formulas (58) show that the non-singular transformation 
x= Zé. (59) 


reduces the quadratic iorms A(z, z) and B(«, xz) simultaneously to sums of 
squares : 


D aff and Ye. (60) 
kml k=l 


This property of (59) characterizes a principal matrix Z. For suppose 
that the transformation (59) reduces the forms A(z,z) and B(z, x) simul- 
taneously to the canonical forms (60). Then (58) holds, and hence (56) 
and (57) holds for Z. (58) implies that Z is non-singular (|Z|+~0). We 
rewrite (57) as follows: 


zi (Az* — A, Bz')=0 (¢e—1, 2, ..., 2), (61) 


where k has an arbitrary fixed value (1S hn). Thesystem of equations 
(61) can be contracted into the single equation 


Z" (Az — A, B2*) =O; 
hence, since Z* is non-singular, 


j.e., for every k (49) holds. Therefore Z is a principal matrix. Thus We 
nave proved the following theorem : 


314 X. QUADRATIC AND HERMITIAN ForMS 


THEOREM 9: If Z= 1 2x || wa principal matriz of a regular pencil 
of forms A(x, x) — AB(x, 2x), then the transformation 


e=Zé (62) 


reduces the forms A(2z,x) and B(x,x) simultaneously to sums of squares 


n n 
DAs, Dd) bs (63) 
kal k=l 
where 1, do, ..., An are the characteristic values of the pencil 
A(x, 2)—AB(x, x) corresponding to the columns 2}, 2?,..., 2" of Z. 


Conversely, if some transformation (62) semultaneously reduces A(z, x) 
and B(x, x) to the form (63), then Z= || zu |i is a principal matrix of the 
regular pencil of forms A(x, z)— AB(z, 2). 

Sometimes the characteristic property of the transformation (62) for- 
mulated in Theorem 9 is used for the construction of a principal matrix 
and the proof of Theorem 8.7? For this purpose, we first of all carry out 
a transformation of variables x=Ty that reduces the form B(z,z) to 


the ‘unit’ sum of squares > y; (which is always possible, since B(z,z) is 
positive definite). Then A oe is carried into a certain form Ai(y, y). 
Now the form A, (y, y) is reduced to the form Sy A, €7 by an orthogonal trans- 
formation y=Qé (reduction to peicpal: Ages '). Then, obviously,?? 


> y= > &. Thus the transformation z = Zé, where Z = TQ, reduces the 
kel kel 


two given forms to (63). Afterwards it turns out (as we have shown on 

p. 313) that the columns 2!, 2?,... , 2" of Z satisfy the relations (49) and (50). 

In the special case where B(z, z):is the unit form, ie., B(z,x) = Dd) xj. 
kml 


so that B = E, the characteristic equation of the pencil A(z, z)— AB(z, 2) 
coincides with the characteristic equation of A, and the principal vectors 
of the pencil are characteristic vectors of A. In this case the relations (50) 
can be written as follows: 


ait ek = by, (1,4 =1, 2, ecey n) 
and they express the orthonormality of the columns 2}, 2”, ..., 2”. 


20 See [17], pp. 56-57. 
21 An orthogonal transformation does not alter a sum of squares of the variables, 
because (Qz)TQz= 2" x. 


§ 6. Prncius or QuapRaTIc Forms 315 


2. Theorems 8 and 9 admit of an intuitive geometric interpretation. We 
introduce a euclidean space R with the basis e), en, ..., e, and the funda- 
mental metric form B(x, z) just as was done for the proof of Theorem 8. 
In R we consider a central hypersurface of the second order whose equation 
is. 

nr 


A(x, 2) = 3) agxt=c. (64) 


i, k = 


After the coordinate transformation 7 = Zé, where Z = | 24k ik is a prin- 
cipal matrix of the pencil A(z, x) — AB(z, x), the new basis vectors are the 
vectors z!, z?,..., 2” whose coordinates in the old basis form the columns 
of Z, i.e., the principal vectors of the pencil. These vectors form an ortho- 
normal basis.in which the equation of the hypersurface (64) has the form 


> AH". (65) 
k=l 
Therefore the principal vectors z!, 2?,..., 2" of the pencil coincide in direc- 
tion with the principal axes of the hypersurface (64), and the characteristic 
values Ay, do, ..., An Of the pencil determine the lengths of the semi-axes: 


A=tS (b= 1, 2.04 0). 
st 

Thus, the task of determining the characteristic values and the principal 
vectors of a regular pencil of forms A(z, z)—AB(z, x) is equivalent to the 
task of reducing the equation (64) of a central hypersurface of the second 
order to principal axes, provided the equation of the hypersurface is given 
in a general skew coordinate system”? in which the ‘unit sphere’ has the 
equation B(z, x) = 1. 


Example. Given the equation of a surface of the second order 
22" — 2y? — 32? — 10yz + 2xz —4=0 (66) 


in a general skew coordinate system in which the equation of the unit sphere 
1S . 
Qa? + By? + Qe? + Qaz2=1, (67) 


it is required to reduce equation (66) to principal axes. 
In this case 


2 0 I 20:1 
A=|/0 —2 —6|I, B={0 3 0 
1 —5 —3 1 0 2 


a a 


22 J.e., a skew coordinate system with distinct units of lengths.along the axes. 


316 X. QUADRATIC AND HERMITIAN ForMS 


The characteristic equation of the pencil | A — AB | = 0 has the form 


0 —2-—34 —5 =0. (68) 
1—A — 5 —3— 2h 
This equation has three roots: 4; =1, 42 =1,. 4; =— 4. 


We denote the coordinates of a principal vector corresponding to the 
characteristic value 1 by u,v. w. The values of uw. v, w are determined from 
the system of homogeneous equations whose coefficients are the elements of 
the determinant (68) for A=1: 


0-4%+0-07+0-w—OdO, 
0-u—bs —iw =U, 
0O-u—5v —Sw =0. 
‘In fact we have only one relation 


v+w—0. 


To the characteristic value 4=1 there must correspond two orthonormal 
principal vectors. The coordinates of the first can be chosen arbitrarily, 
provided they satisfy the relation v + w=0. 

We set 


u=0, v, w=— v. 
We take the coordinates of the second principal vector in the form 
viv, ww =—v’ 
and write down the condition for orthogonality (B(z2!, 27) =0) : 
Quu’ + 3vv’ + 2ww’ + uw’ + w’'w =0. 


Hence we find: u’=5v’. Thus, the coordinates of the second principa 
vector are 
u’ = 50’, vv’, w’ = — Uv’. 


Similarly, by setting 4 = — 4 in the characteristic determinant, we finc 
for the corresponding principal vector : 


u”, vy” =—.u”", ww’ a Qu"’. 


§ 7. EXTREMAL PROPERTIES OF CHARACTERISTIC VALUES 317 


The values of v, v’, and uv” are determined from the condition that the 
coordinates of a principal vector must satisfy the equation of the unit sphere 
(B(2,x) =1), 1e, (67). Hence we find: 


J I ] 
y= — vy’ = ui = — sz 


V5 3/5 - 
Therefore the principal matrix has the form 


| Vi 1 


jou 


0 a 
3 

1 ] 

yo 3y5- 38il’ 

1 2 

8 3/5 3 


and the corresponding coordinate transformation (r7=Zé) reduces the 
equations (66) and (67) to the canonical form 


t+ e—4e—4=0, +84 8=1 


The first equation can also be written as follows: 


This is the equation of a one-sheet hyperboloid of rotation with real semi- 
axes equal to 2, and an imaginary one equal to 1. The coordinates of the 
endpoint of the axis of rotation is determined by the third column of Z, 
i.e., — 1/3, 1/3, 2/3. The coordinates of the endpoints of the other two ortho- 
gonal axes are given by the first and second columns. 


§ 7. Extremal Properties of the Characteristic Values of a 
Regular Pencil of Forms”* 


1. Suppose that two quadratic forms are given 
nr i) 
A(v,z)= 3S ay,x,x, and B(x, x)= 3"b,,2,2,, 
t,k=1 t,k=1 


of which B(x,2z) 1s positive definite. We number the characteristic values 
of the regular pencil of forms A(x, 7) —AB(s,x7) m non-desecending order : 


Ay Sag See SA. (69) 


23 In the exposition of this section, we follow the book [17], § 10. 


318 X. QUADRATIC AND HERMITIAN Forms 


The principal vectors’* corresponding to these characteristic values are 
denoted, as before, by 2?, 27, ..., 2”: 


Z* = (245, Zeus +--+ np) (A=1, 2,...,m). 


Let us determine the least value (minimum) of the ratio of the forms 
A(x, 2) 
B(z, 2) 
(x0). For this purpose it is convenient to go over to new variabes 
E,, &o,...,&n by means of the transformation 


considering all possible values of the variables, not all equal to zero 


h 
w=Zi (a= D'egég t=—1, 2,..., 0), 
kul 


where Z= || zu |i is a principal matrix of the pencil A(z, z) —AB(z, 2). 
In the new variables the ratio of the forms is represented (see (63)) by 


A(mz)_ AEP + Age te++ + Andis (70) 
Biz,z) FF EF +R 


On the real axis we take the » points 41, Az, ..., dn. We ascribe to these 
points non-negative ‘masses m, = ¢j,m,=¢3, ..., m,—= &, , respectively. 
Then, by (70), the quotient 4{% *) 


Biz, 2) is the coordinate of the center of these 
masses. Therefore 


ad il ee 


As B(x, x) = 


Let us, for the time being, ignore the second part of the inequality and 
investigate when the equality sign holds in the first part. For this purpose, 
we group together the equal characteristic values in (69) : 

Aye Ay <p Het = Apa <itt (71) 

The center of mass can coincide with the least value 4; only if all the 
masses are zero except at this point, i1.e., when 

Pe &,=0. 


In this case the corresponding z is a linear combination of the principal 
columns z!, 2”, ..., 2°22 Therefore all these columns correspond to the 
characteristic value 4, so that z is also a principal column (vector) for A = 4}. 


24 Here we use the term ‘principal vector’ in the sense of a principal column of the 
pencil (see p. 310). Throughout this section, having the geometric interpretation in mind, 
we often call a column, a vector. 


n A 
25 From «= ZE it follows that x= JD &2*. 
kal 


§ 7. ExtTrReMAL PROPERTIES OF CHARACTERISTIC VALUES 319 


We have proved: 


THEoREM 10: The smallest characteristic value of the regular pencil 
A(x,x) —AB(a2, 2x) is the minimum of the ratio of the forms A(x,x) and 
B(x, x) 


> 


A (x, 2) (72) 


A= min Fez)" 


and this minimum 1s only assumed for principal vectors of the characteristic 
value Aj. 


2. In order to give an analogous ‘minimal’ characteristic for the next char- 
acteristic value 4», we restrict ourselves to all the vectors orthogonal to 2?, 
1.e., to those that satisfy the equation”® 


B(z',z) =0: 
For these vectors, 


and therefore 


min Biz ee (B(z2!, 2) =0). 


H{[ere the equality sign holds only for those vectors orthogonal to z! that 
are principal vectors for the characteristic value Ao. 

Proceeding to the subsequent characteristic values, we eventually obtain 
the following theorem : 


THEOREM 11: For every p (lS psn) the p-th characteristic value A, 
m (69) 1 the minimum of the ratio of the forms 


A (x, x) 
B(x, 2)’ 


4,= min (73) 


provided that the variable vector x is orthogonal to the first p —1 ortho- 
normal principal vectors z}, 27,..., 2¢—}: 


26 Here, and in what follows, We shall mean by the orthogonality of two vectors 
(columns) 7,y that the cquation B(z,y¥) = 0 holds. This is in complete agreement 
with the geometric interpretation given in the preeeding section. We shall regard the 
quantities 1, te, ..., Tr as the coordinates of a vector x in some basis of # euclidean space 


in which the square of the length (the norm) is given by the positive-definite form 
n 
Bir, 2) = LD biyxzzy, . In this metric the veetors 2', 2°, ¢ form an orthonormal 


OE, Rome 


n 
basis. Therefore, if the vector c= 2° &z* is orthogonal to ofe of the 2%, then the cor- 
kml 


responding & ==); 


320 X. QUADRATIC AND HERMITIAN ForMsS 


B (z}, 2) =0, ..., B(z?!, z)=0. ' (74) 


Moreover, the minimum is assumed only for those vectors that satisyy the 
condition (74) and are at the same time principal vectors for the charar- 
teristic value Ap. 


3. The characterization of A, given in Theorem 11 has the disadvantage 
that it is connected with the preceding principal vectors 2}, 2°... .. 2e-" and 
can therefore be used only when these vectors are known. Moreover. there 
is a certain arbitrariness in the choice of these vectors. 


In order to-give a characterization of 4p (p= 1, 2,..., n) free from these 
defects, we introduce the concept of constraint imposed on the variables 
X1,22,.--5Dn. 

- Suppose that linear forms in the variables x;, ry,.... 7, are given: 
Dy (©) = Lye + dyty tess tle, (k=1,2,...,h)- (74") 

We shall say that the variables 71, ro. ..., c, or (what is the same) the 
vector x is subject to hk constraints Li, Lo... , Ly if only such valies of the 
variables are considered that satisfy the system of equations 

4? 
L,(2)=0 (k=1,2....,h). (74"°) 


Preserving the notation (74’) for arbitrary linear ‘forms we imtroduce 

a specialized notation for the ‘scalar product’ of .r with the principal feeiits 
ee fe : a. : 

Ey(x) =B(zt,x) (k4=1,2,.... 0)” (75) 

Furthermore, when the variable vector is subject to the constraints (74'") 

A (x, x) 

B(x, x) 


Ht (53 By, Leg, «s+, I,). 


we shall denote min as follows: 


In this notation, (73) is written as follows: 


A= * - (76) 
4, = u (Bs Ly, Te ices i,.) (p=1, 2, ...5%)- 
We consider the constraints 
(77) 


L, (x} = 0, cee) D,_4 (x) =—0 


and 
Pe ~ 78) 
Eu (2)=0, -.-, Ey (a) =0. 


ao: = ww = “aes Z., are the ele 
oe Ly,(z) = 2° Bx = by pty oe 14 2p qoeee Lit, where Likes lois a #2 nk 
ments of the row matrix 2*TB (k= 1,2,...,%). f 


§ 7. HXTREMAL PROPERTIES OF CHARACTERISTIC VALUES 321 


Since the number of constraints (77) and (78) is less than n, there exists 
a vector x“! 0 satisfying all these constraints. Since the constraints (78) 


express the orthogonality of xz to the principal vectors z?+},... ,2", the corre- 
sponding coordinates of 2 are &,,,;=---=&,=0. Therefore, by (70), 
: A (x(l),e(1)) AER fo vee + AD ES oe 
B (2x(1), 2) a ee a = 
But then 


A | A (21), 2(2)) 
B (Ss Ly, Dy, «Dy ) 2 Hteaysay S4y- 

This inequality in conjunction with (76) shows that for var.able con- 
straints L,, Leo, ,.., Lp, the value of u remains less than or equl to A, and 
becomes A, if the specialized constraints 1, Le,..., Lp—1 ave tak on. 

Thus we have proved: 


Turorem 12: If we consider the minimum of the ratio of the two forms 
A (%, %) 
B(z, x) 
the maximum of these minima is equal to A,: 


for p—1 arbitrary, but variable, constraints Ly, Le, ..., Lp—1i, then 


A p= max u (5 L,, Dae L,.1) (p= Le cose , 7). (79) 
Theorem 12 gives a ‘maximal-minimal’ characterization of A, dv, ..., An 
in contrast to the ‘minimal’ characterization which we discussed in Theo- 
rem 11. 
4. Note that when in the pencil A(z,z7) —AB(z,z) the form A(.x,7) is 
replaced by — A(z,z), all the characteristic values of the pencil change 
sign, but the corresponding principal vectors remain unchanged. Thus, the 
characteristic values of the pencil — A(z,x) —AB(z,z) are 
—A, S—A 1S °°+ SA. 


Moreover, by using the notation 


y (SL » Le, ..+, Z,)=max Fea (80) 
when the variable vector is subject to the constraints Z,, Le, ..., Ln, we can 
write: 

u(—Fs Ly Ly ., Ly) =—9 ($3 Ly, Ly --+s Ly) 
and 


max ¢ (— D,, Le, a t,) =— min» & y Oi Oana L,). 


A(x, x) 


Therefore, py applying Theorems 10, 11, and 12 to the ratio — B(a, 2) 


we obtain instead of (72), (76), and (79) the formulas 


399 X. QUADRATIC AND Hermitian Forms 


A (2, 2) 
B(z,2)’ 


A-w- = 
dep =? (Zi Le ere oe 


4, = Max 


(p ad) Scaring: MH) 
j -L L L 
J po ey B’ I> y e039 p Ll/}» 


These formulas establish the ‘maximal’ and the ‘miuimal-maximal’ prop- 
erties, respectively, of A,, dz, ..., dn, which we formulate in the following 
theorem : 


THEOREM 13: Suppose that to the characteristic values 


Ay Ag Se SA, 
of the regular pencil of forms A(z,xz) — AB(z, x) there correspond the lane- 
arly independent principal vectors of the pencil 2’, z°,...,2". Then: 
1) The largest characteristic value A, 1s the maximum of the ratio of the 
forms Aaa), 
B (z, x) ° 
ae A (2, x) 
4, max B(z,2) ’ (81) 


and this maximum is assumed only for principal vectors of the perl corre- 
sponding ‘to the characteristic value Ay. 


2) The characteristic value p-th from the end dn_p4 (2Q2<ps n) ws the 
maximum of the same ratio of the forms 


A (2, x) 


An—p+1 — Max B(z,z) (82) 
provided that the variable vector x 1s subject to the constraints :7® 
B (2*, 2) = 0, B (2™, z) =0,... , B (2*?+?, x) =0, (83) 
1.€., 
A+ + re 
An—pti = (33 Ly Dx, eoey E,, oss); (84) 


this maximum is assumed only for principal vectors of the pencil correspond- 
ing to the characteristic value An_p+1 and satisfying the constraints (83). 


28 In a euclidean space with a metric form B(z, 2), the condition (83) expresses the 
fact that the vector z is orthogonal to the principal vectors z2”-?+2,..., 2%. See foot- 
note 26. 


§ 7. EXTREMAL PROPERTIES OF CHARACT :RISTIC VALUES 323 


3) If in the maximum of the ratio of the forms é a 2 


w th the constraants 
L, (z)=0, ... , Ly (4) =0 (2<p<n) 


(2S pn) the constraints are varied, then the least value (minimum) of 
this maximum is equal to An—p41: 


. (A 
Ano min »(5; L;, L,, a | L,_1). (85) 


5. Let 
D8 (x) =0, L8(z)=0, ..., LP (x)=0. (86) 


be h independent constraints.2® Then we can express. h of the variables 
Z1,Zo,..., 2, by the remaining variables, which we denote by v4, vo, ... , Un—a- 
Therefore, when the constraints (86) are imposed, the regular pencil of 
forms A(x, z) —AB(z, x) goes over into the pencil A°(v, v) — AB°(», v), 
where B°(v, v) is again a positive-definite form (only in n —h variables). 
The regular pencil so obtained has n — h real characteristic values 


Ms Ss SMy. (87) 


Subject to the constraints (86) we can express all the variables in terms 
of n — h independent ones 1, vz, ..., Un—n in various ways. However, the 
characteristic values (87) are independent of this arbitrariness and have 
completely definite values. This follows, for example, from the maximal- 
minimal property of the characteristic values 


A°(v,v) 
B°(v,v) 


4° =min “(53 ie he (88) 


and, in general, 


A) = max (Fe Ly, Ly, ..., L,.) 


= A , : 
=max y (5; Lf, ere 53 rear Ly 1). (89) 


where in (89) only the constraints Z,, Le,..., Lp-1 are allowed to vary. 


29 The constraints (86) are independent when the linear forms Ly (2), LE (2), er 
L}(z) on the left-hand sides of (86) are independent. 


324 X. QuaprRaTic AND HERMITIAN Forms 


The following theorem holds: 


THEOREM 14: If 4, Sd25... S54, are the characteristic values of the 


regular pencil of forms A(r,x2) —ABl sr, r) and SAS + ++ SAL, are 


the characteristic values of the same pencil subject to h independent con- 
straints, then 


Ap SSA, (P=12,..., 2—h). (90) 


Proof. The inequality 4, < A (p=1,2...., n —h) follows easily from 
(79) and (89). For when new constraints are added, the value of the 


oats A 
minimum 4 & | ree i,4) Increases or remains the same. Therefore 


0 0 
oe ee eee om 
The second part of the inequality (90) holds in view of the relations 
O_ A yu 0 . 
ay = max u(5; Ly, ce eee L,_1) 


B? 
A = 
Smax u(>; Oe Ee Ly): eee Dpe5\ datas 


Here not only are Ly,.... Ly—1 varied, on the right-hand side, but Ly. .... 
Ly -n—1 also; on the left-hand side the latter are replaced by the fixed con- 
Straints L?, £8,..., L°. 

This completes the proof. 


6. Suppose that two regular pencils of forms 
A(zx,%)—AB(a,x), A(x, x)—AB(z, x) (91) 
are given and that for every «0, 


A(z, zx) A(x, 2) : 
B (2x, 2) “a B(x, x) 


Then obviously, 


S¢. EsrreMaL PROPERTIES OF CHARACTERISTIC VALUES 3825 


Max fe i Digan dees L,1] <i max fu (4 | Fee Oe L,-.] 
(p=1,2,..., n). 
Therefore, if we denote by 4; Sd4eS...A, and Ay She XS... Sdn, re 
spectively, the characteristic values of the penciis (91), then we have: 
ASA, (p=1,2, ..., 0). 
Thus, we have proved the following theorem: 


THEOREM 15: If two regular pencils of forms A(x,x7) —AB(x£, 2) and 
A(r,x) —AB(x, ©) with the characteristic values Ay S 42S... S Ay and 
ay Sho SS... SA, are given, then the tdentical relation 


A(z,2y _ A(z,2) (92) 
B(x,z)~ B(x, x 

umplies that a 

ork a,<h, (p=1,2,..., 0). (93) 


Let us consider the special case where, in (92), B(z,7) = B(x, “). In 
this case, the difference A(z,x1) — A(z,x) is a positive-semidefinite quad- 
ratic form and can therefore be expressed as a sum of independent positive 
squares : 


r 
A (x, 2) =A (a; 2) + 2 [Xs (2)P. 
Then, when the r independent constraints 
X, (x)= 0, X, (v7) = 0, ..., X,(z) = 0 
are imposed, the forms A(s,.s) and ACs, x) voincide, and the peneils 


A(x,x) —AB(a2,7) and A(x, 7) —AB(s,r) have the same characteristic 
_ values 


0 0 
Ay SAg S++ SH_,- 


_ Applying Theorem 14 to both pencils A(z,r)—AB(s,r) and 
A(z, x) —AB(a, x), we have: 


Ap SA Sdn e (D=1,2,...,0—7). 


In conjunction with the inequality (93), this leads to the followmg theorem : 


326 X. QUADRATIC AND HERMITIAN ForMS 


TuroreM 16: If 4; SdeS... Sd, and’; Sho XS... <A, are the char- 
acteristic values of two regular pencils of forms A(z,xz) —AB(z,x) and 
A(x, x) —AB(x,x), where 


A (x, 2) =A (x, 2) + ~ (X, (x)]}?, 


and X;(xz) (t=1, 2,...,17) are independent linear forms, then the following 
enequalities hold :°° 


Ay Say SAyyp (p=1,2,..., n). (94) 


In exactly the same way the following theorem is proved : 

THEOREM 17: If A, S dos... Sd, ands, Sip... hare the char- 
acteristic values of the regular pencil of forms A(x,x) —AB(x,xz) and 
A(az,x) —AB(z,x), where the form B(z,x) ts obtained from B(x, x) by 
adding r positive squares, then the following inequalities hold :** 

An-1SA,S4, (p=1,2,..., 2). (95) 

Note. In Theorems 16 and 17 we can claim that for some p we have, 

respectively Ap < A, and A, < A,, provided of course that r + 0.°? 


§ 8. Small Oscillations of a System with n Degrees of Freedom 


The results of the two preceding sections have important applications in the 
theory of smajl oscillations of a mechanical system with n degrees of freedom. 


1. We consider the free oscillations of a conservative mechanical system 
with » degrees of freedom near a stable position of equilibrium. We shall 
give the deviation of the system from the position of equilibrium by means 
of independent generalized coordinates qi, q2,.-.-, Qn. The position of 
equilibrium itself corresponds to zero values of these coordinates: q, = 0, 
q2—0,...,q@n=0. Then the kinetic energy of the system is represented as 
a quadratic form in the generalized velocities qi, gz, ..-, Qn:°° 


T= Pa biz (G15 Yar ceey In) Ue « 


3° The second parts of these inequalities hold for p Sn —r only. 
31 The first parts of the inequalities hold for p >f 

52 See [17], pp. 71-73. 

33 A dot denotes the derivative with respect to time. 


§ 8. Smauu OSCILLATIONS OF SYSTEM WITH ” DEGREES OF FREEDOM 327 
Expanding the coefficients bix.(q1, G2, ..-, Yn) aS POWer Series in Gi, Go,... 5 Qa 
bn (91 Ia 22 n= Og ters (4, = 1,2, ..., 2) 


and keeping only the constant terms b,,, since the deviations qi, g2,..., q, 
are small, we then have: 


T = batts (by = by 3 4, =1,2, ..., 2). 


The kinetic energy is always PONS, and is zero only for zero velocities 
G1 = Q2=..-=Q,=90. Therefore ) budide is a positive-definite form. 


The sada energy of the systenti is a function of the coordinates: 
P(q1, G2,---, Qn). Without loss of generality, we can take 


P, = P(0,0,...,0) =0. 


Then, expanding the potential energy as a power series in Qi, Qe, ..-5 Qns 
we obtain: 


P= Sag 3 auttte+ vee, 


i=l 


Since in a position of equilibrium the potential energy always has a 
stationary value, we have 


OP a 
Qa,=— = = 0 a=1],2,... nN). 
Keeping only the terms of the second order in q;, ge, ... , Gn, We have 


P= 3 ayes (a4 =a,4; t,& =I, 2,..., 0). 


Thus, the potential energy P and the kinetic energy T are determined by 
two quadratic forms: 


P= Sagan, T =D) batt , (96) 
{,k=1 t, kel 


the second of which is positive definite. 
We now write down the differential equations of motion in the form 
of Lagrange’s equations ‘of the second kind: ** 


doy oT oP 


dt 3g, gq; aq, «= EAs eM). (97) 


34 See, for example, G. K. Suslow (Suslov), Theoretische Mechanik, § 191. 


328 X. QUADRATIC AND HERMITIAN FORMS 


When we substitute for 7’ and P their expressions from (96), we obtain: 


a buds + S449, =0 (= 1724, 5.59). (98) 
=1 k=1 


We introduce the real svmmetric matrices 
A= || au, ||] and B= || bu {li 


and the column matrix q = (@1, qz2..--. Gn) and write the system of equations 
(98) in the following matrix form: 


Bg + Aq=o. (98’) 
We shall seek solutions of (98) in the form of harmonic oscillations 
Gy = 0; sin (wt + a), go= vasin (wt + a), ..-. Gn = U,sin (wi + 2), 


in matrix notation: 
g== vsin(wi+ a). (99) 


Here v= (v4, V2, -.., Un) iS the constant-amplitude culumn (constant- 
amplitude ‘vector’), w is the frequency. and a is the initial phase of the 
oscillation. 
Substituting the expression (99) for q in (98’) and cancelling 
sin (wt + a), we obtain: 
Av = iABv (A=wow?). 


But this equation is the same as (49). Therefore the required amplitude 
vector is a principal vector, and the square of the frequency 4 = w* is the 
corresponding characteristic value of the regular pencil of forms 
A(az, x2) —AB(x, 2). 

We subject the potential energy to an additional restriction by postu- 
lating that the function P(qi, ge, ..-. Qn) in a position of equilibrium shall 
have a strict minimum.** 

Then, by a theorem of Dirichlet,*® the position of equilibrium is stable. 
On the other hand, our assumption means that the quadratic form 
P = A(q,q) is also positive definite. 

By Theorem 8, the regular pencil of forms A(r, 7) —AB(z, x) has real 
characteristic values 41, A42,..., A, and m corresponding principal character- 
istic vectors v!, v?,..., Un (v= (Vin, Vax... Unk) pK = 1. 2,..., 2) satisfy- 
ing the condition 


33 L.e,, that the value of Po in the position of equilibrium is less than all other values 
of the function in some neighborhood of the position of equilibrium. 


36 See G. K. Suslow (Suslov), Theoretische Mechanik, § 210. 


§ 8. SMALL OSCILLATIONS OF SYSTEM WITH 7” DEGREES OF FREEDOM 329 


Bvt, ) = SY bite =O (4, R= 1, 2, ..., 0). (100) 


Hy, y=} 


From the fact that A(z,2) is positive definite it follows that all the 
characteristic values of the pencil A(z, x) —AB(x,z) are positive :*” 


. A, >0 (k=1, 2,...,). 
But then there exist » harmonic oscillations*® 


v* sin (w,t + a,) (op A,, E=1, 2, ..4, 0); (101) 


whose amplitude vectors u* = (v4, Vor, ..., Une) (K=1, 2,..., n) satisfy 
the conditions of ‘orthonormality’ (100). 

Sinee the equation (98’) is linear, every oscillation can he obtained by a 
superposition of the harmonic oscillations (101): 


q= 3) Apsin (wp +) 0%, (102) 
k=] 


where A, and a, are arbitrary constants. For, whatever the values of these 
constants, the expression (102) is a solution of (98’). On the other hand, the 
arbitrary constants can he inade to satisfy the following initial conditions : 


Teo %> Tao %- 
For from (102) we find: 


n n 
Io — > Azsina, d= 3) w,A,cos av. (103) 

k=l k=l 
Sinee the principal columns v?, v?,.... vu" are always linearly independent, 
the values A; sin a, and w, cos a, (k = 1, 2,..., 7), and henee the constants 


A; and a, (e=1, 2,...,n). are uniquely determined from (103). 
The solution (102) of our system of differential equations can be written 
more conveniently : 


n 
G= D} Asin (wt + a) ry. (104) 
k=l ‘ 


Note that we could also derive the formulas (102) and (104) starting 
from Theorem 9. For ijet us consider a non-singular transformation of the 


37 This follows, for example, from the representation (63). 
38 Here the initial phases ax (kK =1,2,...,n) are arbitrary constants. 


330 X. QUADRATIC AND HERMITIAN FORMS 


variables with the matrix V = 1 Vik || that reduces the two forms A(z, zx) 
and B(z, x) simultaneously to the canonical form (63). Setting 
G= > 9, (s =1, 2, ..., 2) (105) 
k=l 
or, more briefly, 
q= V0 (O= (A, 62, teey 6n)) (106) 


and observing that g = V6, we have: 


P=A(Q4Q9)=S AO, T=BGD=SG. (107) 
tem} k=l 
The coordinates 6, 62, ..., 9, in which the potential and kinetic energies 


have a representation as in (107) are called principal coordinates. 
We now make use of Lagrange’s equations of the second kind (98) and 
substitute the expressions (107) for P and T. We obtain: 


6,.4+4,0.=0 (k=1,2,...,n). (108) 
Since A(q,q) is positive definite, all the numbers 4), Ay, .... A, are positive 

and can be represented in the form 
A, = wo} (w,>0; K=1, 2,...,%). (109) 

From (108) and (109), we find: 

6, = A, sin (w,t + @,) (K=1, 2,...,%). (110) 
When we substitute these expressions for 6, in (105), we again obtain 
the formulas (104) and therefore (102). The values v;, (4,4 = 1, 2,..., 7) 
in both methods are the same, because the matrix V = | Vix | 7 in (106) is, 


by Theorem 9, a principal matrix of the regular pencil of forms 
A(z,z2) —AB(z, 2). 


2. We also mention a mechanical interpretation of Theorems 14 and 15. 


We number the frequencies w,, wo... ., w, of the given mechanical svstem 
in non-descending order: 


0O< a, S@.5'°°'SO,.- 


The disposition of the corresponding characteristic values 4,= w; (k =1, 2, 
3,..-., ”) of the pencil A(z, 7) —AB(z, zx) is then also determined : 


Ay S 4g S++ SA. 


§9. Hermitian Forms 331 


We impose h independent finite stationary constraints*® on the given 
system. Since the deviations qi, g2,..., Gn are supposed to be small, these 
connections can be assumed to be linear in qi, go, ... 5 Qn: 


Iy(q) =0, L2(q) =0, ee Dn(q) =0. 


" After the constraints are imposed, our system has n—h degrees of 
freedom. The frequencies of the system, 


wo} Sao s+++ Soy, 


are connected with the characteristic values A? S 42S -++S A_, of the 
pencil A(z, x) —AB(z, xz), subject to the constraints L,, L2,..., In, by the 
relations 4? = wy? (j=1,2,...,u—h). Therefore Theorem 14 immediately 
implies that 

0; 50; S O44 (j=1, 2,...,n—h). 


Thus: When h constraints are wmposed, the frequencies of a system can 
only increase, but the value of the new j-th frequency w? cannot exceed the 
value of the previous (j + h)-th frequency w,,,. 

In exactly the same way, we can assert on the basis of Theorem 15 that: 
With increasing rigidity of the system, 1.e., with an increase of the form 
A(q,q) for the potential energy (without a change in B(q,q)), the fre- 
quencies can only increase; and with increasing inertia of the system, +.€., 
with an increase of the form B(q,q) for the kinetic energy (without a 
change in A(q,q)), the frequencies can only decrease. 

Theorems 16 and 17 lead to an additional sharpening of this proposition.*° 


§ 9. Hermitian Forms“ 


1. All the results of §§ 1-7 of this chapter that were established for quad- 
ratic forms can be extended to hermitian forms. 
We recall*? that a hermitian form is an expression 


39 A finite stationary constraint is expressed by an equation f(qi, gz, ..., qa) =0, 
where f(q:, 2, ---» Qn) is some function of the generalized coordinates. 

40 The reader can find an account of the oscillatory properties of elastic osciliations 
of a system with » degrees of freedom in [17], Chapter III. 

41 In the preceding sections, all the numbers and variables were real. In this section, 
the numbers are complex and the variables assume complex values. 


42 See Chapter IX, § 2.° 


332 X. QUADRATIC AND HERMITIAN Forms 


A(x, 2)= Shyxz, (hg = hy; t,h=1,2,..., 0). (111) 


t,kml 
To the hermitian form (111) there corresponds the following bilinear 
hermitiam form: 


H(z, y) = »' hieDiYn3s . (112) 
i,k=t 
moreover, 
H(y, 2) = H(z, y) (113) 
and, in particular, 
H(z, 2) = H(z, 2) (113”) 
Le., the hermitian form H(z, 27) assumes real values only. 
The coefficient matrix H = || hy! 7 of the hermitian form is hermitian, 
ie, 2°=H* 
By means of the matrix H = 1 hix {| we can represent H(x,y) and, in 


particular, H(x,x) in the form of a product of three matrices, a row, a 
square, and a column matrix :** 


H(z,y)=2"Hy, H(2z,2) =2"Hz. (114) 
If 
m P 
z= Sou, y= J' dv, (315) 
f=] kml 


where wv, v* are column matrices and ¢;, d, are complex numbers (i= 1, 2, 
3,...,m;k=1, 2,..., p), then 


H(2, y= 3) SodH (ul, 2). (116) 


tum} k=l] 


We subject the variables x, ro,..., 2, to the linear transformation 


%= 3 tabs — @=1,2,..., 0) . (117) 


‘s A matrix symbol followed by an asterisk * denotes the matrix that is obtained from 
the given one by transpesition and replacement of all the elements by their complex 
conjugates (H* = HT), 

#4 Here 


H = (yy Loy. sy Lady H = (Lyy Fay. -y Tu)y Y= (Yar Yas ee Mido Y= (Yas Yor» +» Yn) 


the sign ' denotes transposition. 


§ 9.‘ Hermitian Forms 333 
or, in matrix notation, 
e=TE (T= ||tallD. (117°) 


After the transformation, H(z, z) assumes the form 


Ae, p= Zi abies » 


where the new coefficient matrix H=| hir ik Is connected with the old 
coefficient matrix H = | hi | by the formula 
H=T HT. (118) 


This is immediately clear when, in the second of the formulas (114), . 
replaced by Té. 
If we set T = W, then we ean rewrite ( 118) as follows: 


H= W*HW. (119) 


From the formula (118) it follows that JZ and H have the same rank 
provided the transformation (117) is non-singular (| 7 | 40). The rank of 
H is called the rank of the form H(z, z). 

The determinant | H! is called the discriminant? of I(x,2). From 
(118) we obtain the formula for the trausformation of the discriminant on 
transition to new variables: 


|H\|=|H||T||7|. 


A hermitian form is called singular if its discriminant is zero. Obviously. 
a singular form remains singular under any transformation of the vari- 
ables (117). 

A hermitian form J/(.r,7) can be represented in infinitely many ways 
in the form 


H(z, x) = 3'a,X,%;, (120) 


t=1 


where a, 540 (c=1, 2,..., 7) are real numbers and 


X; = D> Uhr (= 1, 2, sii , r) 


k=l 


are independent complex Jinear forms in the variables z,, r2,.... Xn.** 


45 Therefore r =n. 


334 X. QUADRATIC AND HERMITIAN ForMS 


We shall call the right-hand side of (120) a sum of lonearly independent 
squares*® and every term in the sum a positive or a negative square accord- 
ing asa; >0Oor <0. Just as for quadratic forms, the number r in (120) 
is equal to the rank of the form H(z, zx). 


THEOREM 18 (The Law of Inertia for Hermitian Forms): In the repre- 
sentation of a hermitram form H(x,x) as a sum of linearly independent 
squares, 


H (x, 2) = 3 a,X,X,, 
i=l 


the number of positive squares and the number of negative squares do not 
depend on the choice of the representation. 


The proof is completely analogous to the proof of Theorem 1 (p. 297). 

The difference o between the number z of positive squares and the num- 
ber v of negative squares in (120) is called the signature of the hermitian 
form H(z,x2): c=na—y. 

Lagrange’s method of reduction of quadratic forms to sums of squares 
ean also be used for hermitian forms, only the fundamental formulas (15) 
and (16) on p. 299 must then be replaced by the formulas*’ 


l n g 
H (2, od ors = age + H, (x, 2), (121) 
1 : n hr 2 n . h 2 
H(z, =F & (his + 5) elias &» (hig) Xe b+ Hgts x). (122) 


Let us proceed to establish Jacobi’s formula for a hermitian form 
n , : . . 
H(2,x2) = > hy,x;,%, of rank r. Here, as in the case of a quadratic form, 
‘k= 
we assume that 


D,=H( )*0 (k=1, 2,...,7). (123) 


This inequality enables us to use Theorem 2 of Chapter II (p. 38) on the 
representation of an arbitrary square matrix in the form of a product of 
three matrices: a lower triangular matrix F,, a diagonal matrix D, and an 
upper triangular matrix L. We apply this theorem to the matrix H = 1 hin |? 


«6 This terminology is eonnected with the fact that X.X, is the square of the modulus 
of X. ( X,X, = | X;/*). 

47 The formula (121) is applicable when log 3 0; and (122), when hy = hy = 0 
hy = 0. 


§ 9. Hermitian Forms 335 
and obtain 


B= F{Dy, Ft 02s Gs 0, OL, (124) 
1 r—1 
where F = | fu ike c= | Liz lk , and 
1 1...4—1 9 1 1...k—lk 
le Dy ae ) i De sare ; (125) 
(j=k, k+1,...,n; K=1,2,...,7), 
fe=ly=0 (i<k; i, k=1,2,...,2). (126) 
Since H = | hiz || is a hermitian matrix, it follows from these equa- 
tions that 
= 2k; k=1,2,...,7; t=1,2,...,%, 
Foe ba Fre i, k=1,2,...,0 a 


Since all the elements in the last n — r columns of F and the last n—r 
rows of £ can be chosen arbitrarily,*® we choose these elements such that 
1) the relations (127) hold for all 1,% 


fu=ly (1,k=1, Zysie5 


and 2) | F|=|{LZ|~0. Then 0 
F=L*, 128) 
and (124) assumes the form 
D D, 
= [* —2 weep) bead b. 129 
H=L {D,, Dives Bes 0 of (129) 
Setting _ 
T =|\t,|=L, (130) 
we write (129) as follows: 
D D, = 
HHT {Dy Bhs eee Fos 0 OP (|T|340). (181) 


A comparison of this formula with (118) shows that the hermitian form 
n Di oy 
2D 8% = (Po=) (182) 


under the transformation of the variables 


48 These elements, in fact, drop out of the right-hand side of (124), because the last 
n—? diagonal elements of D are zero. 


336 X. QUADRATIC AND HERMITIAN FoRMS 


E, =. bp, (3 =1, 2, ooey n) 
kml 


goes over into f/x, 2x), ie., that Jacobi’s formula holds: 


. ~ — 
H (2, 2) = ee Den X,X, (Dy= 1), (133) 
where 
Ky Met bese tee tbe, (4 =1,2,..-,17) (134) 
and 
1 12...k—1 j - 
t_—=>-H j=h+1,...,; K=1, 2,...,17). (135) 
8 Dy (; 2...h-1E) UO*Ft ) 
The linear forms Y,, Xo,..., X, are independent, since .Y, contains the 
variable x, which does not occur in the subsequent forms Yp41.---. NG 
When we introduce, in place of X,1, Xo,.....- Y,. the linearly independent 
forms 


Y= DX, (k=1,2,...,7), (136) 
we can write Jacobi’s formula (133) in the form 


 ¥:Y 
H (x, n= 2 Deo, 20 (Po). (137) 


According to Jacobi’s formula (137), the number of negative squares 
in the representation of H(.r, x) is equal to the number ot variations of sign 
in the sequence 1, Di, Do, ..., D, 


v= KC1, D,, Do, ee ey D,), 
so that the signature o of H(x, .r) is determined by the formula 
om=r—2V (1, D;, Do,...,D,). (138) 


All the remarks about the special cases that may occur, made for quad- 


ratic forms ($3). automatically carry over to hermitian forms. 
R 


DeErInition 5: A hermitian form H(2,x7) = a hint: %, 8 called post- 
t,k4 <1 
tive (negative) semidcfinite if for arbitrary valucs of the variables x4, Xz, 


XL3,..., Xn, not all equal to zero, 


H(z,2)=0 (SO). 


§ 9. Hermitian Forms 337 


DEFINITION 6: A hermitian form H (x, x2) = ~ hiptXy 18 called positive 
(negative) definite if for arbitrary values of the orables 1. Xo,-.., Xn, not 


all equal to zero, 
H(z,z)>0 (<0). 


n 


THEOREM 19: A hermitian form H (2,27) = > Nixtiky is positive defi- 
t, ka] 
mite if and only if the following inequalities hold : 


12...k 


D,= 
. a, Decck 


}>o (k=1,2,..., 2). (139) 


nm 


THEOREM 20: A hermitian form H(7,2) = =, hintily, 18 positive sema- 


definite if and only if all the principal minors of H = } hin [1 are non- 
negative : 
t, fe ...4 
a(? = if > 0 
é, ig... 4, (140) 
(41, ta). ..ytp = 1,2,...,n;p=1,2,...,0). 


The proofs of Theorems 19 and 20 are completely analogous to the proofs 
of Theorems 3 and 4 for quadratic forms. 

The conditions for a hermitian form H(z, 2x) to be negative definite or 
semidefinite are obtained by applying (139) and (140) to the form 
— H(z,«x). 

From Theorem 5’ of Chapter IX (p. 274), we obtain the Theorem on the 


reduction of a hermitian form to principal axes: 
n 


THEOREM 21: Every hermitian form H(x,2) = hints, can be re- 


duced by a unitary transformation of the variables 7 
x—UE (UU*= Ef) (141) 
to the canonical form 
A (&, &) a) AEE e , (142) 
where d;, 42, ..-,A4n are the craractertstic values of the matrir H = 1 hax a 
Theorem 21 follows from the formula 
H=U I Adu UA=T" |Ady\|P (UT =UI=7). (143) 


vl 


Let H(2z,2) = - hinwsX, and G(z,2r) = aS SixtiX, be two hermitian 
forms. We shall ‘i aay the pencil of Ganaitiag forms A (2, x) —-AG (7,7) 


338 XX. QUADRATIC AND HERMITIAN Forms 


(A is a real parameter). This pencil is called regular if G(z, x) 1s positive 
definite. By means of the hermitian matrices H = || Ay ||? and G@= | Gai "7 
we form the equation 


| H— AG | =0. 


This equation is called the characteristic equation of the pencil of her- 
mitian forms. Its roots are called the characteristic values of the pencil. 

If A, is a characteristic value of the pencil, then there exists a column 
2= (#1, 22,..., 2n) 0 such that 


Hz= oe. 


We shall call the column z a principal column or principal vector of the 
pencil H(x,x) —AG(xz, x) corresponding to the characteristic value Ao. 


Then the following theorem holds: 


THEOREM 22: The characteristic equation of a regular pencil of hermi- 
tran forms H (x, x) —AG (a, x) has n real roots Ay, do, ..., dn. To these roots 
there correspond n principa: vectors z', 27, ..., 2" satisfying the conditions 
of ‘orthonormality’ : 


G(2', 24) = dy (4,4 =1, 2,..., 7). 


The proof is completely analogous to the proof of Theorem 8. 

All extremal properties of the characteristic values of a regular pencil 
of quadratic forms remain valid for hermitian forms. 

Theorems 10-17 remain valid if the term ‘quadratic form’ is replaced 
throughout by the term ‘hermitian form.’ The proofs of the theorems are 
then unchanged. 


§ 10. Hankel Forms 


1. Let so, $1, .-.» Son—2 be a sequence of numbers. We form, by means of 
these numbers, a quadratic form in n variables 


nl 
S(x,y) = 2 savtite- (144) 
k= 
This is called a Hankel form. The matrix S= || s:+x ||z-1 corresponding 


to this form is called a Hankel matriz. It has the form 


§10. Hanke, Forms 339 


59 §, 8, Sn—-1 

8; 8 8 S, 
S=l|S 8% & Sat 

Bn_. 8, Snir -++ Son-2 


We denote the sequence of principal minors of S by D;, De, ..., Da: 
D,= | 84x |F? (p=1,2,...,). 


In this section we shall derive the fundamental results of Frobenius about 
the rank and signature of real Hankel forms.*® 
We begin by proving two lemmas. 


Lemma 1: If the first h rows of the Hankel matrix 8 = || si4x ||*-1 are 
linearly independent, but the first h + 1 rows linearly dependent, then 
D, +0. 


Proof. We denote the first h + 1 rows of S by Ri, Ro, ..., Rn, Ray. 
By assumption, R;, Re, ..., A, are linearly independent and Ry4; 1s ex- 
pressed linearly in terms of them: 


h 
emai el 
Byi= ~ CP are 


or 


h 
B= ay (Gq=hht+l,...,h+n—l). (145) 
j=l 
We write down the matrix formed from the first h rows R,, Re, ..., R, 
of S: 
8 91 52 Sn_1 
8 

4 =e 8s n (146) 


Sy 8, Spyz e+ Spine 


This matrix is of rank h. On the other hand, by (145) every column of the 
matrix can be expressed linearly in terms of the preceding h columns ard 
hence in the terms of the first h columns. But since the rank of (146) is h, 
these first h columns of (146) must then be linearly independent, i.e., 


D, #0. 
This proves the lemma. 


*9 See [162]. 


340 X. QuApRaTIc AND Hermitian Forms 


Lemma 2: If in the matrix S= || Sean 0. for a certain h (< n), 
D,-0, Dy =++ + =D, =0 (147) 
and 
SA+k 
1...hh+i+l , 
S 
= ith ERE) D, 7 (148) 
S(; oa *) Dr S2ntk—1 


Sati >>> Sonti—1 Santi+k 
(i,k, =0,1, ...,»—h—1) 


then the matriz T = | tix | Fania is also a Hankel matriz and all its elements 
above the second diagonal are zero, 1.e., there exist numbers ty-n-19- ++: 
ton-2n-2 such that 


t= tue (t,k=0, 1, Sees ,n—h—1; a a —h—-2 = 9)- 


Proof. We introduce the matrices 


T, = || ba |[2 (p=1,2,... n—h). 


In this notation T = T,,_». 

We shall show that every T, (p=1, 2,..., n —h) is a Hankel matrix 
and that t,=0 fori +k = p—2. The proof is by induction with respect 
to p. 

For the matrix 7, our assertion is trivial ; for T.2, it is obvious, since 


— || foo fo1 
a= bo fy 


D 
» tor=ty (because S issymmetric) and to = Dp, =. 


Let us assume that our assertion is true for the matrices T, (p < n —h) 


we shall show that it is also true for 7,,,—= || ty ||. From the assumptior 
it follows that there exist numbers ¢,_,,t,,..-,l,-2 such that witl 
t=. . .=tp-2=0 


T', =| tere Ild7- 
Here 


| 7, |= 64. (149 


On the other hand, using Sylvester’s determinant identity (see (28) oy 
page 32), we find: 


D 
[ 7, |=? =0. (15¢ 


§10. Hanxer Forms 34) 
Comparing (149) with (150), we obtain 


tp = 0. (151) 
Furthermore from (148) 
Saiz 
1 
ty = Sontite D, D, ‘; (152) 
Sontk—1 


Saaz Sopsi_y 


By the preceding lemma, it follows from (147) that the (Ah + 1)-th row 
of the matrix S = ||s;+x ||§-? is linea*'- dependent on the first h rows: 


h 
bg = ag (q=h,h+l1,...,ht+n—l). (153) 
y= 


Let i,k SpSitkS2p—1. Then one of the numbers ¢ or k is less 
than p. Without loss of generality, we assume that i < p. Then, when we 
expand, by (153), the last column of the determinant of the right-hand side 
of (152) and use the relations (152) again, we shall have 


Srtk—g 
h 
= w| D 
be = Sonsite + a D, h 
g= 
Sontk—g—1 
Satis +> Senge 
h 
= Sonsite T x Oy (bs n_g -— SentttE—p) ° (154) 
g= 


By the induction hypothesis (151) holds, and sinee in (154) 1< p,k—Q <p 
and t1+k—g SS 2p — 2, we have te pg bire—g- Therefore, forit+tk < 
all the ¢,, = 0, and for p =1+k S 2p — 1 the value of tx, by (154), depends 
on 1+ k only. 

Thus, 7'p41 1s a Hankel matrix, and all its elements to, t1,..., fp_1 above 
the second diagonal are zero. 

This proves the lemma. 


Using Lemma 2, we shall prove the following theorem : 


342 XX. QUADRATIC AND HERMITIAN ForMS 


THEOREM 23: If the Hankel matriz S = | Si+k Ilo" thas rank r and tf for 
some h (<r) 
D,+0, Dy = =D, = 9, 


then the principal minor of order r formed from the first h and the last r —h 
rows and columns of S ts not zero: 


1...k n—r+htl n—r+h42.... 1 


p= s( ) 0. 
l1...k n—r+hstl n—r+h2 ... n 


Proof. By the preceding lemma, the matrix 


Sra ALELD 
T =|| ty |r joc We OE = 0 Tecan RD) 


is a Hankel matrix in which all the elements above the second diagonal] are 
zero. Therefore 


[P| = tna. 


Da ' 
On the other hand,® | T | =p 0. Therefore to, ,,-1 = 0, and the matrix 
T has the form 


One ee ee oc ee ee ee 
* Un-nr-1 

; ° ; @ Us 

0 Un_p—1 e e e Ug Uy 


The rank of 7 must be r—h.*! Therefore for r << n — 1 in the matrix T 
the elements U,_441==... = Un—a+1= 9, and 


50 By Sylvester’s determinant identity (see (28) on p. 32). 

51 From Sylvester’s identity it follows that all the minors of T in which the order 
exceeds r—h are zero. On the other hand, S contains some non-vanishing minors of 
order r bordering Da. Hence it follows that the corresponding minor of order r—h of T 
is different from zero, . 


§ 10. Hanxepy, Forms 


Oo .. 343 
iS as ae 


oO | 
T= 
| : Ue (wy, _ 0) 
0..0 wy | 
put then, by Sylvester’s identity (see page 32), 
+l...n—h 
D(r) =D, T (” =f 
m—r+tl, ay ,7~O, 
and this is what we had to prove. _ 
1 


Let us consider a real®? Hankel form S(z, u)— 28 Si44%,%, of rank r. 


We denote by 2, v, and o, respectively, the number of positive and of negative 
squares and the signature of the form: 


amtoyv=r, o=n—v=r—2y. 


By the theorem of Jacobi (p. 303) these values can be determined from 
the signs of the successive minors 


Do= 1, D;, D,, oor) D,_1, D, (155) 
by the formulas 
n= P(1,D,,..-D,), v=V(1,Dy,...,D,), | ies 
o=P(1, D,, ..., D,)—V (1, Dy, ... D,) =r—2V (1, Dy, ..., D,). 


These formulas become inapplicable when the last term in (155) or any 
three consecutive terms are zero (see §3). However, as Frobenius has 
shown, for Hankel forms there is a rule that enables us to use the formulas 
(156) in the general case: 


n—1 
THEOREM 24 (Frobenius) : For areal Hankel form S(2,2)= D/ 844% 
1,k=0O 
of rank r the values of n, v, and o can be determined by the formulas (156) 
provided that 


vehi cece 
52 In the preceding Lemmas 1 and 2 and in Theorem 23, the ground field can be taken 


as an arbitrary number ficld—in particular, the field of complex or of real numbers. 


344 X. QUADRATIC AND HERMITIAN FORMS 
1) for 
D,0, Dy, = +++ =D, =0 (h<r) (157) 


D, 1s replaced by D™, where 


po=sg 1 Neueshai. a)™® 
1...kAn—r+hGl..in 


2) many group of p consecutive zero determinants 
(D9) Days =Dare= +++ =Dapp=9 (Dat per £9) (158) 


a sign is attributed to the zero determinants according to the formula 


4 G1) 
sign D,,,=(—1) 2 sign D,. (159) 


The values of P, V, and P — V corresponding to the group (158) are 
then :°° 


p odd p even 
+1 1 
Pi,» =P(D,, D h+1> wey Dar oes) ae Patt 
: rs Gere as ana (160) 
= ae = 
Vaso V (Das Daaas «++ 2 Daspes) a pt. & 
P Pp Vn, Pp 0 e 


D, ~ 
Proof. To begin with we consider the case where D,>40. Then the 
—1 


forms S(z, x)= 's Si44%;% and S, (2x, 2) = Ss 5;44%;%, have not only the 
i, kaO {, k=0 
same rank r, but also the same signature o. For let S(z, x) = Zi? , where 


the Z, are real linear forms and ¢,= +1 (1=1, 2,...,7r). We Set Ur41 =, 
.=Z,-,;=0. Then the forms S(z,z) and Z; go over, respectively, into 


S,(z,x2) and Z; (0=1, 2,..., 7); and 8,(r,z) = a ath, i.e., S,(xz, 2) has 


53 The formulas (159) and (160) are also applicable to (157), but we have to set 
p==r—h—1 and interpret Drip 1 not as D-=0, but as D@) ~ 0, 


§ 10. HanxKEL Forms 345 


the same number of positive and negative squares as S(x,2z).5* Thus the 
signature of S,(z, z) iso. 

We now vary the parameters So, 81, ... , Sor-—2 continuously in such a way 
that for the new parameter values sj, sj, ..., 83,5 all the terms of the 
sequence®® 

1, Dj, Dg, ..., DF (DF =| sie! $75 q=1, 2, ...,7) 
are different from zero and that in the process of variation none of the non- 
zero determinants (155) vanishes.*¢ 

Since the rank of S,(z, x) does not change during the variation, its signa- 
ture also remains unchanged (see p. 309). Therefore 


o=P(l, D},....,D*)—V (1, Dj, ..., Dt). (161) 
If D, +0 for some #, then sign Df=sign D,;. Therefore the whole prob- 
lem reduces to determining the variations in sign among those Dj that corre- 


spond to D;=0. More accurately, for every group of the form (158) we 
have to determine 


* * * * * * * 
P (Dy, Dasa +++ > Diaspar) — V (Da, Days --++ Datos Diy pes) 


For this purpose we set: 


Sr+k 
1 D ‘ 
t., =— i .22, DP). 
tk D; h : (t, k 0, 1, Pp) 
Sortk—1 
Brig °° * Songg_1 Sangin 
By Lemma 2, the matrix T= || ty, ||? is a Hankel matrix and all its 


elements above the second diagonal are zero, so that T has the form 


NA AN 
54 The linear formsZ,,Z,,...,Z,are linearly independent, hecause the quadratic 
Pr 


“N 

form S§(z,x) = 5) 6¢Z? is of rank r (Dr x 0). 

f=1 

55 In this section, the asterisk * does not indicate the adjoint matrix. 

56 Such a variation of the parameter is always possible, because in the space of the 

6 e ° 6 a 

parameters 5, Si, ..., Ser—2 an equation of the form D,=—0 determines a certain algebraic 
hypersurface. If a point lies in some such hypersurfaces, then it can always be approxi- 
mated by arbitrarily close points that do not lie in these hypersurfaces. 


346 X. QUADRATIC AND HERMITIAN FORMS 


0... OF 
. ry ° * 

p=|l’ . . of. (162) 
0. 


t, koe | 
“N “N “N 
We denote the successive minors of T by D;, Do, ..., Dp +1: 
“N 
Dy =| te |* (q=1, 2, alae P+ 1). 


Together with 7, we consider:the matrix 


"= |lte ll’ , 
where 
Sik 
] ; x ; : . 
he De : D, . (t, k=0, I, ..., P) 
Sot k-1 


* * * 
Sptg > ++ SQn44_7 S2pr gay 


and the corresponding determinants 


“~~ 


Dr=|talg> (q=1, 2,...,p+1). 

By Shive er’s determinant identity , 
“~ ‘ 

ntg = DzDY (q=1, 2, ...,p4+1). 

Therefore 
PDD iiy into Di iV Op, DS ay ceng De ea) 
“N “~N “N “N “\ 
ND. aees Dt .1) —V(l, DY, ...; Dt.) = o*, (163) 


“~N 
where o* is the signature of the form 


P 
T* (x, %)=_S" th xy x, - 
4,4 =0 


Together with T*(:x, x), we consider the forms 


Pp 
T (x, 2) a 2) t,,,.%,%, and T'** (x, x) =ty (aot, + 2 My_y +++ + LyaXo). 


§10. Hanke. Forms 347 


The matrix T** is obtained from T (see (162) ) when we replace in the latter 
all the elements above the second diagonal by zeros. We denote the signa- 
tures of 7'(z, x) and T**(z,x) by @ ando**. Since T*(z, x) and T**(z, x) 
are obtained from 7T(z,2z) by variations of the coefficients during which 


the rank of the form does not change (| 7**|=|T|= Cnpet = a 


a 0), the signatures of T(z, x), T*(z, x), and rn, x) must also 


be equal : de Se, 
o=o0*=o**, (164) 
But 7 
T** (2, x)= 2ty (LoLop_y + 0+ + Ty_yX%) for odd p, 
t. [2 (xproz + o°° + %p_1%y1) + 24] for even p 


Since every product of the form 2%_7g with a 5 f# can be replaced by a dif- 


2 — 73\2 ays 
set =) (=> 5 *) , we can obtain a decomposition of 


ference of squares ( 


T** (x, x) into independent real squares and we have 


are 0 for odd p, (165) 
signt, for even p. 


On the other hand, from (162), 


Press | |= (—1) 2 BH. (168) 


From (163), (164), (165), and (166), it follows that: 


P (Dj, Dea one) Dip) — V (Dy, Drv) seey Di p+) 
__f0 for odd p, 


le for even p. (167) 
where i 
e=(—1)* sign 
Since 
P (Disa, De, 6.5 Dégops) + V (Dine Dias ++» Piept)=P +1, (168) 


the table (160) can be deduced from (167) and (168). 
Now let D,=0, Then for someh<r 


D, ~ 0, Digi" °° =D,=09. 


348 X. QuapraTic aNpD HERMITIAN FORMS 


In this case, by Theorem 25, 


po=s( i} sore hth | eo, 
Ll.c.k n—r+h+l...n 


The case to be considered reduces to the preceding case by renumbering 
1 


m= 
the variables in the quadratic form S(z, 7) = 3" s14%2i%x. We set: 
i, k=0 


Zo = Zo» a | Tr1 = yw x, —= Za—r+h? e@eey oe | — Vn—-1) 
L, = Lp, eeey T,_1 — Fq- th—1° 


a | 
Then S(z, 2) = Da Spy TL 


{,k=0 
Starting from the structure of the matrix T on page 346 and using the 
relations 
“A > D 
Dy= Bt, D=“E G12, 2... 8) 
obtained from Syl!vester’s determinant identity, we find that the sequence 
1, Di, Dz, ..., D, is obtained from 1, D,, Do, ..., D, by replacing the single 
element D, by D™. 
We leave it to the reader to verify that the table (160) corresponds to 
the attribution of signs to the zero determinants given by (159). 
This completes the proof of the theorem. 
Note. It follows from (166) that for odd p (» is the number of zero 
determinants in the group (158) ) 


D prt 
sign St = (— 1)? 


In particular, for p= 1 we have D,D,.12< 0. In this case, we can omit 
D)411n computing V(1, D;,..., D,), thus obtaining Gundenfinger’s rule. In 
exactly the same way, we obtain Frobenius’ rule (see page 304) from (160) 
for p = 2. 


ABSOLUTE CONCEPTS, 184 

Addition of congruences, 182 

Addition of operators, 57 

Adjoint matrix, 82 

Adjoint operator, 265 

Algebra, 17 

Algorithm of Gauss, 23ff. 

generalized, £5 

Angie between vectors, 242 

Axes, principal, 309 
reduction to, 309 


Basis(gs), 51 
characteristic, 73 
coordinates of vector in, 53 
Jordan, 201 
lower, 202 
orthonormal, 242, 245 
Bessel, inequality of, 259 
Bézout, generalized theorem of, 81 
Binet-Cauchy formula, 9 
Birkhoff, G. D., 147 
Block, of matrix, 41 
diagonal, isolated, 75 
Jordan, 151 
Bioeck multiplication of matrices, 42 
Bundle of vectors, 183 
Bunyakovskii’s inequality, 255 


CaRTAN, theorem of, 4 

Cauchy, formula of Binet-, 9 
system of, 115 

Cauchy identity, 10 

Cauchy index, 174, 216 

Cayley, formulas of, 279 

Cayley-Hamilton theorem, 83, 197. 

Cell, of matrix, 41 

Chain, see Jordan, Markov, Sturm 

Characteristic basis, 73 

Characteristic direction, 71 

Characteristic equation, 70, 310, 338 

Characteristic matrix, 82 

Characteristic polynomial, .71, 82 


INDEX 


[Numbers in italics refer to Volume Two] 


351 


Characterization of root, minimal, 319 
maximal-minimal, 321, 322 
Chebyshev, 173, 240 
polynomials of, 259 
Chebyshev-Markov, formula of, 248 
theorem of, 247 
Chetaev, 121 
Chipart, 173, 221 
Coefficients of Fourier, 261 
Coefficients of influence, réduced, 111 
Column, principal, 338 
Column matrix, 2 
Columns, Jordan chains of, 165 
Components, of matrix, 105 
of operator, hermitian, 268 
skew-symmetric, 281 
symmetric, 281 
Compound matrix, 19ff., 20 
Computation of powers of matrix, 109 
Congruences, 181, 182 
Constraint, 320 
Convergence, 110, 112 
Coordinates, transformation of, 59 
of vector, 53 
Coordinate transformation, matrix of, 60° 


D’ALEMBERT-EULER, theorem of, 286 
Danilevskii, 214 
Decomposition, of matrix into triangular 
factors, 33ff. 
polar, of operator, 276, 286; 6 
of space, 248 
Defect of vector space, 64 
Derivative, multiplicative, 133 
Determinant identity of Sylvester, 32, 33 
Determinant of square matrix, 1 
Diagonal] matrix, 3 
Dilatation of space, 287 
Dimension, of matrix, 1 
of vector space, 51 
Direction, characteristic, 71 
Discriminant of form, 333 


352 INDEX 


Divisors, elementary, 142, 144, 194 Gaussian form of matrix, 39 
admissible, 238 Golubchikov, 124 
geometrical theory of, 175 Governors, 172, 233 
infinite, 27 Gram, criterion of, 247 
Dmitriev, 87 Gramian, 247, 251 
Domain of stability, 232 Group, 18 
Dynkin, 87 unitary, 268 
" Gundenfinger, 304 
EIGENVALUE, 69 
Elements of matrix, 1 HADAMARD INEQUALITY, 252 
Ehmination method of Gauss, 23ff. generalized, 254 
Equivalence, of matrices, 61, 132, 133 Hamilton-Cayley theorem, 83, 197 
of pencils, strict, 24 Hankel form, 338; 205 
Ergodic theorem for Markov chains, 95 Hankel matrix, 338; 205 
Erugin, theorem of, 122 Hermite, 172, 202, 210 
Euler-D ’Alembert, theorem of, 286 Hermite-Biehler theorem, 228 
Hurwitz, 173, 190, 210 
FACTOR SPACE, 183 Hurwitz matrix, 190 
Faddeev, method of, 87 Hyperlogarithm, 169 
Field, 1° 
Forces, linear superposition of, 28 IDENTITY OPERATOR, 66 
Form, bilinear, 294 Imprimitivity, index of, 80 
Hankel, 338; 205 Inee, 147 
hermitian, 244, 331 Inertia, law of, 297, 334 
bilinear, 332 Integral, multiplicative, 132, 138 
canonical form of, 337 product, 132 
negative definite, 337 Invariant plane, of operator, 283 
negative semidefinite, 336 
pencil of, see pencil Jacosi, formula of, 302, 336 
positive definite, 337 identity of, 114 
positive semidefinite, 336 method of, 300 
rank of, 333 theorem of, 303 
signature of, 334 Jacobi matrix, 99 
singular, 333 Jordan basis, 201 
quadratic, 246, 294 Jordan block, 151 
definite, 305 Jordan chains of columns, 165 
discriminant of, 294 Jordan form of matrix, 152, 201, 202 
rank of, 296 Jordan matrix, 152, 201 
real, 294 
reduction of, 299ff. KARPELEVICH, 87 
reduction to principal axes, 309 Kernel of A-matrix, 39 
restricted, 306 Kolmogorov, 83, 87, 92 
semidefinite, 304 Kotelyanskii, 108 
signature of, 296, 298 lemma of, 71 
singular, 294 Krein, 221, 250 
Fourier series, 261 Kronecker, 75; 25, 37, 40 
Frobenius, 304, 339, 343; 53 Krylov, 203 
theorem of, 343; 53 transformation of, 206 
Function, entire, 169 
left value of, 81 LAGRANGE, method of, 299 
Lagrange interpolation polynomial, 101 
GANTMACHER, 103 Lagrange-Sylvester interpolation polyno- 
Gauss, algorithm of, 23ff. mial, 97 
generalized, 45 A-matrix, 130 


elimination method of, 23ff. kernel of, 39 


Lappo-Danilevskii, 168, 170, 171 
Left value, 81 
Legendre polynomials, 258 
Liénard, 173, 221 
Liénard-Chipart stability criterion, 221 
Limit of sequence of matrices, 33 
Linear (in) dependence of vectors, 51 
Linear transformation, 3 
Logarithm of matrix, 239 
Lyapunov, 173, 185 
criterion of, 120 
equivalence in the sense of, 118 
theorem of, 187 
Lyapunov matrix, 117 
Lyapunov transformation, 117 


MacMILian, 115 
Mapping, affine, 245 
Markov, 173, 240 
theorem of, 242 
Markov chain, acyclic, 88 
cyclic, 88 
fully regular, 88 
homogeneous, 838 
period of, 96 
(ir) reducible, 88 
regular, 88 
Markov parameters, 283, 234 
Matricant, 127 
Matrices, addition of, 4 
group property, 18 
annihilating polynontial of, 89 


applications to differential equations, 


116ff. 
congruence of, 296 
difference of, 5 
equivalence of, 132, 133 
equivalent, 61ff. 
left-equivalence of, 132, 133 
limit of sequence of, 33 
multiplication on left by H, 14 
product of, 6 
quotient of, 17 
rank of product, 12 
similarity of, 67 
unitary similanty of, 242 
with same real part of spectrum, 122 
adjoint, 82, 266 

reduced, 90 
blocks of, 41 


canonical form of, 63, 135, 136, 139, 141, 


152, 192, 201, 202, 264, 265 
cells of, 41 
characteristic, 82 
characteristic polynomial of, 82 


353 


Matrix, column, 2 


commuting, 7 
companion, 149 
co:npletely reducible, 81 
complex, iff. 
orthogonal, normal form of, 23 
representation of as product, 6 
skew-symmetric, normal form of, 18 
symmetric, normal form of, 212 
components of, 105 
compound, 19ff., 20 
computation of power of, 109 
constituent, 105 
of coordinate transformation, 60 
cyclic form of, 54 
decomposition into triangular factors, 
33 ff. 
derivative of, 117 
determinant of, 1, 5 
diagonal, 3 
multiplication by, 8 
diagonal form of, 152 
dimension of, 1 
elementary, 132 
elementary divisors of, 142, 144, 194 
elements of, 1 
function of, 95ff. _ 
defined on spectrum, 96 
fundamental, 73 
Gaussian form of, 39 
Hankel, 338; 205 
projective, 20 
Hurwitz, 190 
idempotent, 226 
infinite, rank of, 239 
integral, 126; 113 
normalized, 114 
invariant polynomials of, 139, 144, 194 
inverse of, 15 
minors of, 19ff. 
irreducible, 50 
(im) primitive, 80 
Jacobi, 99 
Jordan form of, 152, 201, 202 
A, 130 
and linear operator, 56 
logarithm of, 239 
Lyapunov, 117 
minimal polynomial] of, 89 
uniqueness of, 90 
minor of, 2 
principal, 2 
multiplication of, by number, 5 
by matrix, 17 


BeatriceGloria_personal library 


394 INDEX 

Matrix, nilpotent, 226 Matrix equations, 215ff. 
non-negative, 50 uniqueness of solution, 16 

totally, 98 Matrix multiplication, 6, 7 

non-singular, 15 z. _ Matrix polynomials, 76 
normal, 269 7 left quotient of, 78 
normal form of, 150, 192, 201, 202 multiplication of, 77 
notation for, 1 Maxwell, 172 
order of, 1 Mean, convergence in, of series, 260 
orthogonal, 263 Metric, 242 
oscillatory, 103 eae euclidean, 245 

, partitioned, 41,42 1 ~~ hermitian, 243, 244 
permutable, 7 os positive definite, 243. 


| 


permutation of, 50 positive semidefinite, 243 
polynomial, see polynomial ma tre Minimal indices for columns, 38 
polynomials in, permutability of, 13 Minor, 2 

9 


positive, 50 


Imost principal, 102 
spectra of, 53 : : ae 


of zero density, 104 


totally, 98 Modulus, left, 275 
power of, 12 Moments, problem of, 236, 237 
computation of, 109 Motion, of mechanical system, 12 
power series in, 113 of point, 121 
principal minor of, 2 stability of, 125 
quasi-triangular, 43 asymptotic, 125 
rank of, 2 
reducible, 50, 51 NaIMARK, 221, 233, 250 
normal form of, 75 Nilpotency, index of, 226 
representation as product, 264 Norm, left, 275 
root of non-singular, 233 a of vector, 243 
root of singular, 234ff., 239 Null vector, 52 
Routh, 191 Nullity of vector space, 64 
row, 2 Number space, n-dimensional, 52 
of simple structure, 73 
singular, 15 OPERATIONS, elementary, 134 
skew-symmetric, 19 Operator (linear), 55, 66 
square, 1 adjoint, 265 
square root of, 239 decomposition of, 281 
stochastic, 83 hermitian, 268 
fully regular, 88 positive definite, 274 
regular, 88 positive semidefinite, 274 
spur of, 87 projective, 20 ° 
subdiagonal of, 13 spectrum of, 272 
superdiagonal of, 13 identity, 66 
symmetric, 19 invariant plane of, 283 
trace of, 87 


matrix corresponding to, 56 
normal, 268 

positive definite, 280 
positive semidefinite, 280 
normal, 280 


transformation of coordinate, 60 
transforming, 35, 60 

transpose of, 19 

triangular, 18, 218; 154 


unit, 12 : 

: orthogonal, of first kind, 281 
unitary, 263, 269 (im ) proper, 281 

unitary, representation of as product, 5 of second kind, 22] 

UPPer quasi-triangular, - polar decomposition of, 276, 286 
upper triangular, 18 real, 282 


Matrix addition, properties of, 4 semidefinite, 274, 280 


INDEX 355 


Operator (linear), of simple structure, 72 
skew-symmetric, 280 
square root of, 275 
symmetric, 280 
transposed, 280 
unitary, 268 , 
, Spectrum of, 273 
Operators, addition of, 57 
multiplictaion of, 58 
Order of matrix, 1 
Orlando, formula of, 196 
Orthogonal complement, 266 
Orthogonalization, 254 
Oscillations, small, “£ system, 326 


PARAMETERS, homogeneous, 26 
Markov, 283, 284 
Parseval, equality of, 261 
Peano, 127 
Pencil of hermitian forms, 338 
characteristic equation of, 338 
characteristic values of, 338 
principal vector of, 338 
Peneil(s) of matrices, canonical form of, 
37, 39 
congruent, 41 : 
elementary divisors of, infinite, 27 
rank of, 29 
regular, 25 
singular, 25 
strict equivalence of, 24 
Pencil of quadratic forms, 310 
characteristic equation of, 310 
characteristic value of, 310 
principal column of, 310 
principal matrix of, 312 
principal vector of, 310 
Period, of Markov chain, 96 
Permutation of matrix, 50 
Perron, 53 
formula of, 116 
Petrovskii, 113 
Polynomial(s), annihilating, 176, 177 
minimal], 176 
of square matrix, 89 
of Chebyshev, 259 
charecteristic, 71 
interpolation, 97, 101, 103 
invariant, 139, 144, 194 
of Legendre, 258 
matrix, see matrix polynomials 
minimal, 89, 176, 177 
monic, 176 
scalar, 76 
positive pair of, 227 


* Polynomia) matrix, 76, 130 


elementary operations on, 130, 131 
regular, 76 
order of, 7& 
Power of mat. x, 12 
Probability, absolute, 93 
hmiting, 94 
mean limiting, 96 
transition, 82 
final, 88 
limiting, 88 
mean limiting, 96 
Product, inner, of vectors, 243 
scalar, of vectors, 242, 243 
of operators, 58 
of sequences, 6 
Pythagoras, theorem of, 244 


QUASI-ERGODIC THEOREM, 95 
Quasi-triangular matrix, 43 
Quotients of matrices, 17 


Ranx, of infinite matrix, 2.29 ; 
of matrix, 2 
of peneil, 29 
of vector space, 64 
Relative concepts, 184 
Right value, 81 
Ring, 17 
Romanovskii, 83 
Root of matrix, 233, 234ff., 239 
Rotation of space, 287 
Routh, 173, 201 
criterion of, 180 
Routh-Hurwitz, criterion of, 194 
Routh matrix, 191 
Routh scheme, 179 
Row matrix, 2 


SCHLESINGER, 133 
Schur, formulas of, 46 
Schwarz, inequality of, 255 
Sequence of vectors, 256, 269 
Series, convergence of, 260 
fundamental, of sulutions, 38 
Signature of quadratic form, 296, 298 
Similarity of matrices, 67 
Singulanity, 143 
Smirnov, 171 - 
Space, coefficient, 232 
decomposition of, 177, 248 
dilatation of, 287 
euclidean, 242, 245 
extension of, to unitary space, 937 
factor, 183 


356 


Space, rotation of, 287 
unitary, 242, 243 
as extension of euclidean, 282 
Spectrum, 96, 272, 273; 53 
Spur, 87 
Square(s), independent, 297 
positive, 334 
Stability, criterion of, 221 
domain of, 232 
of motion, 125 
of solnution of linear system, 129 
States, essential, 92 
limiting, 92 
non-essential, 92 
Stieltjes, theorem of, 232 
Stodol, 173 
Sturm, theorem of, 175 
Sturm chain, 175 
_generalized, 176 
Subdiagonal, 13 
Subspace, characteristic, 71 
coordinate, 51 
cyclic, 185 
generated by vector, 185 
invariant, 178 
vector, 63 
Substitution, integral, 143, 169 
Suleimanova, 87 
Superdiagonal, 13 
Sylvester, identity of, 32, 33 
inequality of, 66 


Systems of differential equations, applica- 


tion of matrices to, 116ff. 
equivalent, 118 
reducible, 118 
reguiar, 1:1, 168 
singularity of, 143 
stability of solution, 129 
Systems of vectors, bi-orthogonal, 267 
orthonormal, 245 


TRACE, 87 
Transformation, linear, 3 
of coordinates, 59 
orthogonal, 242, 263 
unitary, 242, 263 
written as matrix equation, 7 
Lyapunov, 117 
Transforming matrix, 35, 60 
Transpose, 19, 280 
Transposition, 18 


INDEX 


UNIT SUM OF SQUARES, 314 
Unit sphere, 315 
Unit vector, 244 


VaLUE(s), characteristic, maximal, 53 
extremal properties of, 317 
latent, 69 
left and right, of function, 81 
proper, 69 
Vector(s), 51 
angle between, 242 
bundle of, 183 
Jordan chain of, 202 
complex, 282 
congruence of, 181 
extremal, 55 
inner product of, 243 
Jordan chain of, 201 
latent, 69 
length of, 242, 243 
linear dependence of, 51 
test for, 251 
modulo 7, 183 : 
linear independence of, 51 
norm of, 243 
normalized, 244; 66 
null. 52 
orthogonal, 244, 248 
orthogonalization of sequence, 256 
principal, 318, 338 
proper, 69 
projecting, 248 
projection of, orthogonal, 248 
real, 282 es 
scalar product of, 242, 243 
systems of, bi-orthogonal, 267 
orthonormal, 245 
unit, 244 
Vector space, 50ff., 51 
basis of, 51 
defect of, 64 
dimension of, 51 
finite-dimensional, 51 
infinite-dimensional, 51 
nullity of, 64 
rank of, 64 
Vector, subspace, 63 
Volterra, 183, 145, 147 
Vyshnegradskii, 172 


WEIERSTRASS, 25 


